How Agenticness Works
Agenticness measures how independently an AI tool can take action in a loop. It's a spectrum of autonomy, not a quality rating. A simple Advisor can be exactly the right tool for your task.
What is agenticness?
An "agentic" AI tool doesn't just answer questions β it takes action. It can call APIs, modify files, send messages, and make decisions about what to do next. The more independently it operates, the higher its agenticness level. We evaluate every tool in the directory on this spectrum so you can find the right fit for how much autonomy you want.
The Five Levels
Advisor
Answers questions and makes suggestions, but you take every action.
Helper
Executes simple tasks you assign, one step at a time.
Copilot
Works alongside you, handling multi-step tasks with some independence.
Operator
Runs complex workflows autonomously, checking in when needed.
Executive
Sets its own goals and operates independently across systems.
What counts as agentic?
To be considered agentic, a tool must be able to take at least some real-world action β not just generate text. A chatbot that only answers questions scores at or below Level 0 (not labeled). A tool that can call APIs, write code, or modify data starts at Advisor and goes up from there based on how independently it operates.
Five Scoring Dimensions
Each tool is evaluated across five dimensions, scored 0β4 each, for a total of 0β20 points. The total maps to a named level.
Action Capability
Can the tool execute real-world actions β API calls, file operations, sending emails β or does it only generate text?
Autonomy of Control Flow
Does it choose next steps dynamically based on context, or follow rigid, pre-defined scripts?
Adaptation and Recovery
How does it handle errors, unexpected situations, and edge cases? Can it find alternative approaches?
State and Continuity
Does it maintain context and memory across interactions, or start fresh every time?
Safety and Observability
Does it have guardrails, permission systems, audit trails, and human oversight mechanisms?
Evidence and Confidence
Every evaluation is backed by specific evidence from the tool's website and documentation. We also track what information was missing or unclear. Each evaluation carries a confidence level β High, Medium, or Limited β so you know how much evidence was available. Tools with Limited evidence are scored conservatively.
The Alien Badge π½
A rare designation for Executive-level tools that also demonstrate strong safety practices and have high-confidence evaluations. The Alien badge means: this tool operates at the frontier of autonomy and has the guardrails to back it up.
Freshness
AI tools evolve fast. Each listing shows when it was last evaluated, so you can tell if the score reflects the current state of the product. We re-evaluate listings periodically and whenever a vendor updates their content.
FAQ
Is a higher level always better?
No. Agenticness is a spectrum, not a ranking. An Advisor that answers questions well may be exactly what you need. More autonomy means more power but also more complexity and risk.
How is this different from a star rating?
Star ratings measure quality or satisfaction. Agenticness measures capability and independence. A 5-star tool might be an Advisor, and a poorly-reviewed tool might be an Operator.
Can vendors influence their score?
No. Scores are generated by AI evaluation of publicly available information. Vendors can update their content, which triggers a re-evaluation, but they cannot directly set or override scores.
What does the safety caution mean?
If a tool has strong action capability (can do a lot) but weak safety controls (limited guardrails), we flag this so you can make an informed decision. It's not a disqualification β just something to be aware of.