Office Hours — How do you know if AI agents will choose your tool?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
How do you know if AI agents will choose your tool?
You don’t, not upfront. But you can make it more likely by designing for three things: first, your tool has to solve a bottleneck in the agent’s reasoning loop—something that makes the task faster or more reliable than doing it without you. Second, the agent needs to understand when to call you. That means clear, unambiguous function signatures and documentation. Claude Opus 4.7 and GPT-5.5 are both capable at tool use, but they still struggle with subtle conditionals like “call this when X, but not when Y unless Z.” Third, your tool has to return signals the agent can actually parse. Ambiguous outputs or side effects the model can’t observe kill adoption faster than anything else.
Designing for Agent Workflows
The real tell is whether the agent naturally chains your tool into sequences with other actions. If it’s one-shot—call your tool, done—you’re probably not solving a critical enough problem. Agents pick tools they use repeatedly across multi-step workflows. GitHub Copilot’s underlying agents use git, test runners, and linters constantly because those tools have fast feedback loops and unambiguous success signals. An API that returns data the agent can’t verify or act on becomes dead weight.
Take a concrete example: a deployment tool that returns only “success” or “failure” won’t see much agent adoption. But one that returns structured logs, service health metrics, and a rollback command gets woven into autonomous pipelines. Claude Code or Cursor Agent will call it, parse the output, check for actual service stability, and decide whether to proceed or revert—all without human intervention.
// Tool that agents will ignore
POST /deploy
Response: { "status": "success" }
// Tool that agents adopt repeatedly
POST /deploy
Response: {
"status": "success",
"deployment_id": "d-12345",
"health_checks": {
"cpu": 45,
"error_rate": 0.002,
"latency_p99": 142
},
"previous_state": "easily_reversible",
"rollback_command": "curl POST /rollback/d-12345"
}
The second tool gives the agent something to reason about. It can verify the deployment actually worked by checking metrics, not just trusting a status field. That’s adoptable.
The Adoption Signal
Watch production logs. If agents are calling your tool but ignoring the output, or calling it redundantly because they don’t trust the result, that’s the real signal. You want to see adoption that reduces agent hallucination or speeds up convergence to a correct answer. If the agent can accomplish the same task without your tool, it will, eventually.
There’s a harder tradeoff here: making a tool that agents can understand often means stripping away nuance. Agents don’t handle “it depends” well. They need crisp boundaries. A tool for “check if this refactor is safe” won’t work for agents because safety is subjective. A tool for “run these tests and return exit code plus stderr” will, because success is binary. You may have to redesign the tool’s contract around what agents can verify, not what humans find useful.
Also time your feedback loop tightly. If your tool takes 30 seconds to return a result, agents will call it sparingly or in parallel to avoid blocking. If it takes 5 seconds, agents will call it in series and trust the results more. Agents are impatient with latency because their token context window is finite.
Bottom line: Design for verifiable outcomes and tight feedback loops, not just accessibility. Agents adopt tools that measurably improve their reasoning and speed up convergence, not tools that are merely convenient. Make it so the agent can see whether your tool worked, not just that it ran.
Question via Hacker News