Office Hours — Is it better to improve the harness around the LLM or wait for a better model?

Is it better to improve the harness around the LLM or wait for a better model?

Improve the harness. Waiting is a trap that kills shipped products.

Here’s why: model upgrades happen unpredictably, take months to integrate into production safely, and you can’t control them. Your harness, though? That’s leverage you own right now. Better retrieval, smarter prompting, structured outputs, retry logic with fallbacks, grounding against your actual data, prompt caching to reduce latency and cost, tool use design that actually fits your domain. These compound.

I’ve watched teams sit on GPT-4.1 Nano waiting for the next frontier model while their competitors shipped Claude Opus 4.6 with five layers of careful prompt engineering and won. The Opus team didn’t need a better base model. They needed better plumbing.

That said, don’t ignore model releases entirely. When GPT-5.4 or Gemini 3.1 Pro drops, run a one-week eval against your harness. Spend a day testing. But don’t pause shipping to rewrite everything.

The real play is this: build your harness so it’s model-agnostic. Swap inference engines with a flag. Then you get both worlds. Better prompts and tools work on any frontier model. When a new one lands, you gain the improvement for free without rearchitecting.

Bottom line: Ship a solid harness today over a better model tomorrow. Once you’re live, model upgrades become force multipliers instead of blockers.

Question via Hacker News