Office Hours — How much are you actually spending on AI coding at work?

How much are you actually spending on AI coding at work?

Most teams don’t have a real answer to this question. They’ll tell you “not that much,” but that’s usually because they’re not tracking it. The teams that are tracking it are often shocked.

The Real Cost Picture

If you’re using GitHub Copilot, you’re probably on a seat license, around $30/month per developer. That feels cheap until you realize you’re also running inference against OpenAI’s API for any task that doesn’t fit in the IDE. If your team is 20 engineers, you’re at $600/month baseline, plus API costs that nobody’s actually monitoring.

If you’re using Claude Code, Claude Opus 4.7 usage-based pricing, or GPT-5.5 via the API, costs are per-request. At scale, this gets complicated fast. A single autonomous task—cloning a repo, running tests, fixing failures, opening a PR—might hit the API 50+ times. If you’ve got five engineers, each running five tasks a day, you’re looking at 1,250 API calls daily. Multiply that by your model pricing and you start seeing real money.

A Concrete Example

Let’s say your team uses Claude Opus 4.7 for autonomous coding tasks. A single autonomous workflow typically:

Reads the repo (2-3 API calls)
Analyzes the failing test (2-3 API calls)
Generates a fix (1 call)
Validates against linting (1 call)
Runs full test suite (1 call, sometimes with retries)
Opens a PR with context (1 call)

That’s roughly 8-10 API calls per task. If five engineers each run three autonomous tasks a day, that’s 120-150 API calls daily. Over a month, you’re looking at 3,600-4,500 requests. Claude Opus 4.7 pricing varies by input/output tokens, but a typical task at roughly 5K input tokens and 2K output tokens hits around $0.15-0.25 per request. That’s $540-1,125 per month just for one team on one tool.

But that’s the happy path. Most teams run:

Multiple retries because agents get stuck or hallucinate
Exploratory runs where engineers test different prompts
Duplicate work because context gets lost between sessions
Long-running agentic operations where the agent takes inefficient paths

In practice, that $540-1,125 often becomes $2,000-3,500/month for a five-person team actually using agents in production workflows.

Where Teams Leak Money

The biggest leak isn’t in the base model costs. It’s in inefficiency. If your agent is retrying the same failing test five times instead of once, that’s 4x the cost for zero additional value. If your developers are running exploratory prompts in a ChatGPT Plus subscription instead of batching into an API call, you’re losing visibility and control entirely.

Many teams also don’t realize they’re paying for overlapping tools. You’ll have GitHub Copilot (seat license), ChatGPT Plus for personal use ($20/month per engineer), Claude Opus 4.7 subscriptions for some people, and random Claude Code licenses. That can easily hit $200-300/month per person without anyone noticing.

Another leak: token waste. If your agentic system re-reads the entire codebase on every task instead of caching context or using shorter prompts, you’re burning tokens on redundant context. GPT-5.5 has prompt caching built in, but most teams aren’t using it. If you’re submitting 50KB of code context instead of 5KB through structured summarization, you’re 10x the cost for the same task quality.

What Actually Matters

Track API usage by engineer, by task type, and by outcome. Don’t just look at monthly spend. Look at cost per successfully completed task. If you’re spending $300/month but getting three autonomous refactors per engineer out of it, that’s $50 per refactor. If you’re spending $300/month and getting zero completed tasks (all failures or manual cleanup), that’s a waste.

Set hard API budget limits by team member. Enforce them. When someone hits the limit, they have to explain why before they get more quota. This creates visibility and prevents runaway agentic loops from burning through your budget silently.

Use model caching (Claude’s prompt caching, GPT-5.5’s cache control header) for repetitive tasks. If your agent is reading the same file or the same test output multiple times, caching can cut your costs by 30-50%.

Consider batching requests during off-peak hours if your coding agent work isn’t time-sensitive. Some providers offer batch pricing that’s significantly cheaper than real-time API calls. If you don’t need results in five minutes, batch can save 40-60% on large agentic runs.

The Hard Truth

If you’re getting real value—developers shipping more code, fewer bugs, faster refactors—then the cost is justified. Most teams aren’t tracking whether they’re actually getting value. They just see a bill and assume it’s fine because “it’s AI, so it should be helping.”

Measure it. Track cost per shipped feature. Track cost per test pass. Track cost per bug fixed. If your autonomous coding system is costing you $2,000/month and getting you three shipping features per month, that’s $667 per feature. Whether that’s expensive depends on your velocity baseline. If you were shipping two features per month before, it’s a win. If you were shipping five, it’s not.

Bottom line: Stop assuming your AI coding spend is fine. Audit what you’re actually paying across all tools and agents, then measure cost per concrete outcome (tasks completed, bugs fixed, features shipped). Most teams can cut costs 30-40% just by eliminating overlapping tools and enabling caching, while simultaneously improving visibility into whether AI coding is actually delivering value.

Question via Hacker News