Office Hours — How do I actually know if my LLM is hallucinating in production?

How do I actually know if my LLM is hallucinating in production?

You don’t, not really—not until a user complains or your monitoring catches it. But you can reduce the damage.

First, distinguish between types. Factual hallucinations (making up data) are different from reasoning hallucinations (bad logic). For factual stuff, you need grounding: RAG with a retrieval quality check, or constraint-based generation (limiting outputs to predefined options). For reasoning, temperature helps less than you’d think—lower temps reduce variance, not hallucinations.

In production, instrument three things: (1) confidence signals from your LLM (e.g., GPT’s logprobs to flag low-confidence tokens), (2) disagreement detection (run the same query twice—if answers differ significantly, you have a problem), and (3) user feedback loops (make it trivial to flag bad outputs).

The unsexy truth: you can’t fully prevent hallucinations with frontier models today. GPT-5 and Claude still make things up. So design your system to contain hallucinations—never let an LLM output directly into critical paths (payments, medical decisions, legal docs). Always have a human review step or deterministic verification layer.

Bottom line: Stop trying to prevent hallucinations and start designing systems where they can’t cause harm—use RAG for facts, keep humans in the loop for decisions that matter.