Series

Office Hours

13 editions

Office Hours — We're using an LLM to extract structured data from messy PDFs. Sometimes it works perfectly, sometimes it misses fields or invents data. How do I know if the problem is the model, my prompt, or the PDF quality itself?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
April 3, 2026
Office Hours — We're getting inconsistent outputs from the same prompt with GPT-5.4. Temperature is locked at 0. What's actually going on?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
April 2, 2026
Office Hours — I'm using Claude Opus 4.6 for a customer-facing summarization task. Should I batch requests during off-peak hours to save money, or just call the API in real-time?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
April 1, 2026
Office Hours — How do I know when to stop prompt engineering and just upgrade my model?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 31, 2026
Office Hours — Is it better to improve the harness around the LLM or wait for a better model?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 30, 2026
Office Hours — Should I A/B test my LLM prompts in production or is that overkill?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 29, 2026
Office Hours — What's the hardest part of building AI agents that actually work?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 28, 2026
Office Hours — How do you actually test LLM apps beyond vibe checks?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 27, 2026
Office Hours — Why is AI agent reliability barely improving despite 18 months of model upgrades?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 26, 2026
Office Hours — How are people safely reusing cached LLM answers in production RAG systems?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 25, 2026
Office Hours — Do structured outputs from LLMs create false confidence that the response is actually correct?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 24, 2026
Office Hours — How are you handling LLM API costs in production without sacrificing quality?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 23, 2026
Office Hours — How do I actually know if my LLM is hallucinating in production?
A daily developer question about AI/LLMs, answered with a direct, opinionated take.
March 22, 2026