Office Hours — How is AI-assisted coding actually changing your development workflow and productivity in a real codebase? A daily developer question about AI/LLMs, answered with a direct, opinionated take. 2026-06-10T12:00:00.000Z Office Hours Office Hours office-hoursq-and-apractical-ai

Office Hours — How is AI-assisted coding actually changing your development workflow and productivity in a real codebase?

A daily developer question about AI/LLMs, answered with a direct, opinionated take.

Daily One question from the trenches, one opinionated answer.

How is AI-assisted coding actually changing your development workflow and productivity in a real codebase?

The honest answer is more nuanced than “coding is now 3x faster.” AI-assisted coding has fundamentally changed what I spend time on, but the productivity gains are heavily dependent on task type and codebase maturity.

Where AI Coding Actually Moves the Needle

Boilerplate and glue code disappears. I used to spend 20-30 minutes writing test fixtures, API route handlers, or database migrations. Now Claude Code or Cursor Agent handles that in 90 seconds. That time savings is real and compounds across a project.

Pattern matching and refactoring got faster. When I’m reshaping a function signature across 40 call sites or converting a callback-based API to async/await, agents can reliably handle the mechanical parts. I describe the change, it shows me a diff, I verify it makes sense, and we move on. Without agents, that’s an hour of careful search-and-replace.

Exploratory coding in unfamiliar libraries accelerates. I’m working in a new language or framework, and instead of reading docs for 30 minutes, I ask Claude Code to show me the idiomatic pattern. It gets it right 70-80% of the time, which is faster than fumbling through documentation alone.

Where AI Coding Hits Hard Limits

Architectural decisions still need human judgment. An agent can write a REST endpoint, but deciding whether your system should use a message queue, direct RPC, or event sourcing requires understanding trade-offs that no prompt captures. I’ve seen teams burn time because their agent built technically correct code that violated unstated architectural principles.

Monolithic business logic is where agents start hallucinating. If your feature touches five different subsystems, spans multiple services, and depends on subtle state invariants, the agent will confidently write code that compiles but breaks production. I’ve caught this dozens of times: the generated code looks plausible, tests pass locally, but it misses edge cases because the agent couldn’t reason about the full system context.

Debugging failures is still mostly on you. When an agent generates code that breaks in subtle ways—off-by-one errors in retry logic, race conditions in concurrent code, or type mismatches in complex generic systems—the agent often can’t diagnose why. You end up debugging its output, which sometimes takes longer than writing it yourself.

Context window is a real ceiling. I have a 50k-line monolith. Claude Code can see maybe 10k lines usefully at once. For changes touching multiple parts of the codebase, I’m constantly cutting and pasting relevant context. For large codebases, this context-juggling overhead is substantial.

A Real Example: Refactoring a Payment Handler

Last month I refactored a payment processing module that had grown into 2,000 lines handling Stripe, Square, and PayPal. The old code was a mess of conditional logic for different provider APIs.

I asked Claude Code to extract provider-specific logic into separate classes following a strategy pattern. It generated a clean implementation in about 5 minutes. But it missed two things: a cached provider lookup that was doing implicit initialization, and a subtle timing issue in our test setup that depended on the old monolithic structure.

I spent 90 minutes debugging those issues. Writing the refactor from scratch would have taken 3 hours. So we saved 2+ hours, but the time savings came with unexpected debugging work. Without the agent, I would have moved more methodically and caught those issues during my own initial implementation.

The Real Productivity Change

Velocity on greenfield features and straightforward refactors improved by maybe 40-50%. I ship test files, utility functions, and boilerplate faster. But on complex features touching multiple systems, the agent acts more like a pair programmer who generates code quickly but isn’t always reliable—I still need to review carefully, test thoroughly, and often fix things.

The biggest workflow change is psychological: I’m more willing to refactor early because the mechanical parts feel cheap. I’m faster at exploring APIs I don’t know. But this creates risk if you’re not disciplined about testing and code review. Teams adopting agents without strengthening their testing practices often ship more bugs, not fewer.

Cost and Token Reality

On typical days, I spend 15-25k tokens using Claude Code for active development. That’s maybe $0.10-0.20 per day in API costs. On heavy refactoring days it’s 2-3x that. If you’re running GitHub Copilot multi-model (which includes Claude Sonnet 4.6), you’re already paying the subscription, so the marginal cost is negligible. For teams using Claude Opus 4.8 directly, the cost is higher but still small relative to developer salary.

The hidden cost is in review time. Every agent-generated PR requires careful review. In my experience, you’re catching real bugs in maybe 15-20% of generated code. That review overhead is the actual tax on productivity, not the API cost.

Bottom line: AI-assisted coding delivers meaningful velocity gains on mechanical tasks and straightforward patterns, but doesn’t replace thoughtful architecture or careful testing. The productivity win scales with codebase clarity—well-organized code with strong tests is easier for agents to extend reliably. Start with boilerplate and low-risk refactors to build trust, then expand agent usage as you learn where it’s safe in your specific workflow.

Question via Hacker News