Office Hours — What hiring criteria should you use when your team is heavily using AI-assisted coding?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
What hiring criteria should you use when your team is heavily using AI-assisted coding?
Coding velocity is no longer the primary signal. When your whole team is shipping with Claude Code, Cursor Agent, or similar autonomous tools, raw coding speed becomes table stakes. What actually matters now is judgment: knowing when to let an agent loose, when to intervene, and whether the output is correct.
Hire for code review instincts. The person who can read a 200-line PR generated by an agent in 30 seconds and spot the subtle off-by-one error in the loop condition, or catch the missing null check buried in logic the agent synthesized, is now your leverage point. Same goes for architecture sense—can they tell if an agent’s module design will scale, or if it’s locally correct but globally a mess?
Consider a concrete example. An agent generates a database migration that looks reasonable: it adds a column, backfills existing rows, then adds a NOT NULL constraint. But a sharp reviewer catches that the backfill query doesn’t handle the edge case where your app uses a sentinel value (say, -1) for missing data. The agent treated -1 as a real record. That’s the skill you’re hiring for: domain-specific pattern matching that no agent has seen enough of in training data.
Here’s what that looks like in practice. A candidate reviews this migration:
ALTER TABLE users ADD COLUMN last_login_ts BIGINT;
UPDATE users SET last_login_ts = COALESCE(last_login_epoch, 0);
ALTER TABLE users ADD CONSTRAINT last_login_ts_not_null CHECK (last_login_ts IS NOT NULL);
The agent thought it was done. The candidate asks: “What happens to users where last_login_epoch is -1?” They check the app code. Find that -1 is a sentinel for “never logged in.” Realize the migration silently converts that to 0, which now looks like January 1, 1970. The feature breaks. The agent’s code was syntactically sound and would pass any generic test suite. The candidate’s domain knowledge caught the invariant violation.
Skepticism, Not Paranoia
Look for skepticism of AI-generated code. Not paranoia, but healthy distrust. Someone who treats agent output as a draft that needs validation, not gospel. Ask candidates how they’d verify a complex feature an agent built. The best answer isn’t “I’d trust it if tests pass”—it’s “I’d understand the intent before running it.”
This distinction matters operationally. When an agent uses Cursor Agent or Claude Code to refactor your authentication layer across three services, a paranoid hire will demand a rewrite. A skeptical hire will trace the logic, check the test coverage, understand the threat model, and if it’s sound, ship it. Speed compounds over months. Paranoia compounds into shipping delays.
Ambiguous Ownership and Domain Depth
Screen for people comfortable with ambiguous ownership. When an agent co-authors a feature, who debugs it when it breaks in production? Who owns the design decision when three paths were possible and the agent picked one without trade-off analysis? Your team needs people who can navigate that without religious debates about “real coding.” Pragma matters more than purity.
Domain knowledge becomes more valuable, not less. Agents are terrible at business logic they’ve never seen before. If you’re building fintech or medical software, hire people who understand the domain deeply enough to catch when an agent’s “reasonable” solution would break real-world guarantees. A frontier model doesn’t know your specific regulatory constraints or the edge cases baked into your business rules. Your senior engineer does. That knowledge gap is where the real bugs hide.
The New Bottleneck: Specification and Intent
There’s a second-order hiring signal that matters now: can someone write a specification that an agent can actually execute? Not prose specs, but structured intent. “Build a circuit breaker for our rate limiter” is vague. “Build a circuit breaker that tracks failure rate over a 60-second sliding window, opens after 5 consecutive 429 responses, half-opens after 30 seconds, resets after 10 consecutive successful requests—document the state machine in comments” is executable.
Candidates who’ve worked with Claude Opus 4.7, GPT-5.4, or Gemini 3.1 Pro know this instinctively. They’ve learned that vague prompts produce vague code. Ask how they’d specify a multi-step refactor to an agent. Their answer reveals whether they think about intent versus syntax.
This is the flip side of velocity gains. You offshore syntax and boilerplate to agents. You keep humans deep in the domain, reading specs, understanding why the system was built the way it was, and catching the moment an agent’s plausible-looking code violates an unstated invariant. You also keep humans in the loop for the translation layer between business requirement and technical specification. Agents generate from specs. Humans write specs.
Bottom line: Stop hiring for typing speed and start hiring for judgment, verification discipline, domain depth, and specification clarity. The agent does the typing. You need humans who know when the agent is wrong, why it matters, and how to set up the next agent to succeed.
Question via Hacker News