Our framework

How Agenticness Works

Agenticness measures how independently an AI tool can take action in a loop. It's a spectrum of autonomy, not a quality rating. A simple Reactive Tool can be exactly the right fit for your task.

What is agenticness?

An "agentic" AI tool doesn't just answer questions — it takes action. It can call APIs, modify files, send messages, and make decisions about what to do next. The more independently it operates, the higher its agenticness level. We evaluate every tool in the directory on this spectrum so you can find the right fit for how much autonomy you want.

The Six Levels

Each tool's total score (0–32) maps to one of six levels:

L0 — Reactive Tool

0–5/32

Responds to prompts but takes no autonomous action.

L1 — Guided Assistant

6–11/32

Executes tasks you assign, one step at a time, within narrow domains.

L2 — Adaptive Collaborator

12–17/32

Proposes and executes multi-step plans with your approval.

L3 — Domain Specialist

18–23/32

Handles domain-specific workflows independently with dynamic replanning.

L4 — Autonomous Operator

24–28/32

Manages complex cross-domain workflows with self-correction and enterprise safety.

L5 — Strategic Agent

29–32/32

Fully self-directed — sets own goals, operates continuously across systems.

What counts as agentic?

To be considered agentic, a tool must be able to take at least some real-world action — not just generate text. A chatbot that only answers questions scores at Level 0 (Reactive Tool). A tool that can call APIs, write code, or modify data starts at Guided Assistant and goes up from there based on how independently it operates.

Eight Scoring Dimensions

Each tool is evaluated across eight dimensions, scored 0–4 each, for a total of 0–32 points. The total maps to a named level.

Action Capability

Can the tool execute real-world actions — API calls, file operations, sending emails — or does it only generate text?

Autonomy

Does it choose next steps dynamically based on context, or follow rigid, pre-defined scripts?

Planning & Reasoning

Can it decompose goals into multi-step plans, reason about dependencies, and sequence actions effectively?

Adaptation & Recovery

How does it handle errors, unexpected situations, and edge cases? Can it find alternative approaches?

State & Continuity

Does it maintain context and memory across interactions, or start fresh every time?

Reliability

Does it produce consistent, predictable results? Can you depend on it for repeated tasks?

Interoperability

Can it work with external tools, APIs, and protocols like MCP? Does it integrate into existing workflows?

Safety & Observability

Does it have guardrails, permission systems, audit trails, and human oversight mechanisms?

Evidence and Confidence

Every evaluation is backed by specific evidence from the tool's website and documentation. We also track what information was missing or unclear. Each evaluation carries a confidence level — High, Medium, or Limited — so you know how much evidence was available. Tools with Limited evidence are scored conservatively.

The Alien Badge 👽

A rare designation for Strategic Agent-level tools that also demonstrate strong safety practices and have high-confidence evaluations. The Alien badge means: this tool operates at the frontier of autonomy and has the guardrails to back it up.

Freshness

AI tools evolve fast. Each listing shows when it was last evaluated, so you can tell if the score reflects the current state of the product. We re-evaluate listings periodically and whenever a vendor updates their content.

FAQ

Is a higher level always better?

No. Agenticness is a spectrum, not a ranking. A Reactive Tool or Guided Assistant may be exactly what you need. More autonomy means more power but also more complexity and risk.

How is this different from a star rating?

Star ratings measure quality or satisfaction. Agenticness measures capability and independence. A 5-star tool might be a Guided Assistant, and a poorly-reviewed tool might be an Autonomous Operator.

Can vendors influence their score?

No. Scores are generated by AI evaluation of publicly available information. Vendors can update their content, which triggers a re-evaluation, but they cannot directly set or override scores.

What does the safety caution mean?

If a tool has strong action capability (can do a lot) but weak safety controls (limited guardrails), we flag this so you can make an informed decision. It's not a disqualification — just something to be aware of.

← Back to directory