Office Hours — How should AI agents discover and order real-world services?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
How should AI agents discover and order real-world services?
This is the hard problem nobody’s talking about. Most agent frameworks assume tools arrive pre-integrated—you hand the agent a list of APIs and call it a day. But real services sprawl: hotel booking APIs, logistics networks, payment processors, compliance validators. The agent needs to know what exists, whether it’s available, what constraints apply, and how to sequence calls without breaking state.
The Discovery Problem
Right now, agents discover services one of two ways, both fragile. Either you hardcode a list of tool definitions into the system prompt (doesn’t scale), or you embed discovery logic into the agent itself (slow and unreliable). Neither handles the real world, where services appear and disappear, rate limits change, and dependencies matter.
The Model Context Protocol (MCP) is directly addressing this. Instead of burning tokens describing available tools, MCP standardizes how agents ask “what can you do?” and receive structured capability advertisements. A service responds with: “I book hotels, I require a credit card and email, I have a 100 request/hour limit, I depend on a payment gateway being alive.” The agent consumes this once, caches it, and reason over it deterministically.
This matters because without standardized discovery, agents waste token budget re-learning the same tool constraints on every call. With MCP, you describe a capability once and the agent uses that definition across thousands of invocations.
Ordering and Sequencing
Discovery is half the problem. The other half is knowing which service to call first when multiple options exist and they have interdependencies.
Say you’re booking a vacation. The agent needs to:
- Check availability (flight API)
- Reserve a slot (flight API again, with confirmation)
- Book accommodations (hotel API, needs check-in date from flight)
- Process payment (payment processor, needs total from hotel + flight)
- Send confirmation (email or SMS, depends on user preference)
If the agent calls these in the wrong order, or doesn’t wait for one to complete before starting another, the whole chain fails. Traditional RAG can’t solve this because there’s no retrieval happening—it’s pure orchestration.
The pattern that works in production is a combination of:
Tool dependency graphs. Define which services depend on output from others. MCP lets you express this in the capability definition: “I need booking_id from the flight API before I can reserve your hotel.” The agent respects this ordering automatically, not by luck.
Verification signals. After each service call, check the response schema. Does it match what you expect? If not, halt and escalate rather than pretending it succeeded. This is where most agents fail—they see a 200 response and assume the service did what they asked.
Fallback chains. What happens if the primary hotel API is down? You need a secondary provider listed in the discovery response. The agent should try primary first, timeout after N seconds, then try secondary. MCP lets you encode this as structured metadata rather than implicit agent behavior.
Here’s a simplified example of how this works in practice:
# Tool definition with dependency metadata
{
"name": "book_hotel",
"description": "Reserve a hotel room",
"requires": ["flight_confirmation_id"], # Must run after flight booking
"rate_limit": "100/hour",
"dependencies": ["payment_processor"], # Needs this to be healthy
"fallback": "book_hotel_secondary", # Backup if primary fails
"timeout_seconds": 30,
"response_schema": {
"type": "object",
"required": ["booking_id", "check_in", "check_out", "total_price"]
}
}
When the agent encounters this, it knows: don’t call me until you have flight_confirmation_id. If I take longer than 30 seconds, assume I failed and try the fallback. Always validate my response against that schema.
Real-World Constraints
Services in production have quirks. Some require you to call them in a specific sequence or they’ll reject you as a bot. Some have regional availability (payment processor works in US but not EU). Some fail silently and return 200 OK with an empty response.
The agents that work handle this by treating service calls as state machines, not magic. After calling a service, they validate:
- Did I get a response in time?
- Does the response match the declared schema?
- Did the response contain the fields I need for the next step?
If any of these fail, the agent backtracks, tries a fallback, or escalates to a human. Most frameworks skip this validation because it feels verbose, but it’s the difference between demos and production.
Discovery at Scale
If you’re integrating hundreds of real services, hardcoding tool definitions is dead weight. The pattern that’s emerging is:
- Maintain a service catalog (could be a database, could be a config file, could be a live API).
- Each service advertises itself via a standardized endpoint or config format.
- The agent queries the catalog once per session (or cacheable time window) to learn what’s available.
- The agent filters by constraints: “I need a payment processor that supports recurring billing and works in 50+ countries.”
- The agent ranks by freshness, reliability, or cost and picks the best match.
This is what MCP enables. Without it, you’re back to manual integration work.
The Gap That Still Exists
Agents are still bad at discovering services they’ve never seen before and making good decisions about which to use. If you give an agent 10 payment processors, it often picks randomly or picks the first one alphabetically. Better approaches weight by:
- Which one has handled similar transaction sizes recently?
- Which one had the best uptime last 30 days?
- Which one is cheapest for this particular use case?
This requires agents to have observability into past service performance, which most frameworks don’t provide. You need to build a lightweight metrics layer that tracks success rate, latency, and cost per service, then expose that to the agent as context.
Bottom line: Use MCP or an MCP-equivalent to standardize how agents discover services and their dependencies, and enforce strict validation of responses before the agent proceeds to the next step. Without structured discovery and dependency tracking, multi-step real-world workflows fail silently or cascade into worse failures downstream.
Question via Hacker News