Series

Office Hours

13 editions

  1. Office Hours — We're using an LLM to extract structured data from messy PDFs. Sometimes it works perfectly, sometimes it misses fields or invents data. How do I know if the problem is the model, my prompt, or the PDF quality itself?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  2. Office Hours — We're getting inconsistent outputs from the same prompt with GPT-5.4. Temperature is locked at 0. What's actually going on?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  3. Office Hours — I'm using Claude Opus 4.6 for a customer-facing summarization task. Should I batch requests during off-peak hours to save money, or just call the API in real-time?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  4. Office Hours — How do I know when to stop prompt engineering and just upgrade my model?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  5. Office Hours — Is it better to improve the harness around the LLM or wait for a better model?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  6. Office Hours — Should I A/B test my LLM prompts in production or is that overkill?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  7. Office Hours — What's the hardest part of building AI agents that actually work?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  8. Office Hours — How do you actually test LLM apps beyond vibe checks?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  9. Office Hours — Why is AI agent reliability barely improving despite 18 months of model upgrades?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  10. Office Hours — How are people safely reusing cached LLM answers in production RAG systems?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  11. Office Hours — Do structured outputs from LLMs create false confidence that the response is actually correct?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  12. Office Hours — How are you handling LLM API costs in production without sacrificing quality?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.

  13. Office Hours — How do I actually know if my LLM is hallucinating in production?

    A daily developer question about AI/LLMs, answered with a direct, opinionated take.