Office Hours — When building AI agents, how do you decide between deterministic state machines and freeform LLM reasoning?

When building AI agents, how do you decide between deterministic state machines and freeform LLM reasoning?

This is the wrong binary. The question assumes you’re choosing one or the other, but production agents almost always need both, just in different places. The real decision is where to draw the line.

Determinism wins when the path is known

Use a state machine (or workflow DAG, or explicit routing logic) whenever success has a clear definition and the steps are predictable. Filing an insurance claim, approving a purchase order, provisioning infrastructure, processing a payment, extracting fields from a structured form. These tasks have a goal state you can verify. The agent either succeeded or failed.

Deterministic scaffolding is also your friend when you need auditability, compliance, or rollback. If someone asks “why did this happen?”, a state machine’s execution trace is immediate and defensible. An LLM’s reasoning is a story it’s telling you post-hoc, which may or may not match what actually happened.

Cost is another factor. A state machine that routes to Claude Opus 4.8 only for genuinely ambiguous steps is cheaper than asking Claude to reason through every decision. Travelers’ claims assistant uses routing logic to determine when to escalate to human judgment versus letting the agent proceed autonomously. That’s determinism saving money and keeping humans accountable.

Freeform reasoning works when you don’t know the answer

Use open-ended LLM reasoning when the problem is genuinely novel or the solution space is too large to enumerate. Investigating why revenue dropped, debugging a production incident, analyzing a new market, writing a proposal. These aren’t workflows with fixed steps. They’re sense-making problems where the agent needs to hypothesize, test, and adapt on the fly.

The tradeoff is that freeform reasoning is harder to debug, costlier, and slower. If Claude spends 10 minutes reasoning through a problem and arrives at a wrong answer, you don’t have a clear intervention point. With a state machine, if step three fails, you fix step three.

Long chains of freeform reasoning also have a reliability ceiling in practice. The Daily Signal recently surfaced that agents drift when there’s no fast objective signal. If you’re asking an LLM to make 15 consecutive analytical judgments with no intermediate ground truth check, error compounds. By contrast, a deterministic agent that checks an external truth at each step (a test passes, a linter succeeds, a database query returns valid data) stays corrected.

The hybrid pattern

Build agents with a deterministic skeleton and LLM flesh.

Here’s a concrete example: a customer support escalation system.

# Deterministic routing skeleton
class SupportAgent:
    def handle_ticket(self, ticket):
        # Step 1: Classify (deterministic, or LLM but cached/evaluated)
        category = self.classify_ticket(ticket)
        
        if category == "billing":
            # Deterministic path: look up account, apply rule
            return self.billing_pathway(ticket)
        
        elif category == "technical":
            # Freeform reasoning: agent troubleshoots
            diagnosis = self.llm_reason(
                f"Troubleshoot this technical issue: {ticket.description}"
            )
            
            # Immediately verify against knowns
            if diagnosis in self.known_solutions:
                return diagnosis
            
            # If novel, escalate with context
            return self.escalate_to_human(diagnosis)
        
        else:
            # Default: human loop
            return self.escalate_to_human(ticket)

The skeleton is deterministic: classify, then dispatch to appropriate handler. Within the “technical” branch, the agent reasons freely but is immediately constrained by a verifiability check. If the diagnosis matches a known solution, proceed. If not, escalate with the agent’s reasoning as context for the human. You get freeform creativity without unbounded risk.

Another pattern: use deterministic agents for individual tasks, freeform coordination for multi-step workflows. GitHub Copilot routes code changes through a deterministic linter and test suite after the agent generates them. Anthropic’s Project Glasswing uses Claude to reason about code structure but verifies findings through deterministic vulnerability checkers. The reasoning is freeform. The verification is not.

When you can’t decide, measure

If you’re unsure whether a task needs freeform reasoning or can work deterministically, pilot both. Build the deterministic version first, it’s faster and cheaper. Measure where it fails. Those failures tell you whether you need LLM reasoning or just more specific rules.

Qwen3.7-Max’s 35-hour autonomous chip optimization run worked because it had continuous feedback loops: the agent generated code, ran it against a simulator, got a score, adjusted. That’s not fully freeform reasoning. It’s freeform generation constrained by deterministic scoring.

One practical heuristic: if you can write a test for success, use determinism. If you can only recognize success by eye or after human review, add freeform reasoning, but wrap it in escalation logic.

Bottom line: Start deterministic, add freeform reasoning only where you can’t define a success condition upfront, and always verify freeform outputs against something concrete before moving forward. The future of reliable agents is deterministic scaffolding with strategic LLM reasoning inside.

Question via Hacker News