Office Hours — How do you structure LLM applications to prevent hallucinations when the model is confident but factually wrong?

How do you structure LLM applications to prevent hallucinations when the model is confident but factually wrong?

Confident hallucinations are the hardest problem to catch because the model doesn’t signal uncertainty. The model doesn’t know it’s wrong, and neither will your user if you don’t build structure around it. This isn’t about better prompts or waiting for smarter models—it’s about architecture.

The Confidence Problem Is Structural

A model generating a plausible-sounding answer with high probability doesn’t mean the answer is true. GPT-5.5 might confidently explain why your company’s API changed in Q3 2022 when it actually never changed, or describe a feature that was dropped two years ago. The model learned from training data and doesn’t have real-time access to your truth. Confidence and correctness are orthogonal.

The reason this matters: you can’t filter hallucinations with temperature tuning or by asking the model to “be careful.” You need verifiable ground truth baked into your pipeline before the model even responds.

Pattern 1: Retrieval-Augmented Generation With Grounding Checks

RAG is the standard answer, but most implementations stop too early. You retrieve context, feed it to the LLM, and return the answer. If your retrieval is incomplete or the model misinterprets what you gave it, you still hallucinate.

Add a grounding layer that forces the model to cite specific passages from what you retrieved:

from anthropic import Anthropic

client = Anthropic()

def retrieve_documents(query: str) -> list[dict]:
    # Your retrieval system (vector DB, BM25, whatever)
    return [
        {"id": "doc_1", "text": "API v2 was released on March 15, 2024."},
        {"id": "doc_2", "text": "v1 support ended December 31, 2023."}
    ]

def answer_with_grounding(question: str) -> dict:
    docs = retrieve_documents(question)
    
    # Prepare context that explicitly marks document boundaries
    context = "\n".join([f"[{d['id']}] {d['text']}" for d in docs])
    
    response = client.messages.create(
        model="claude-opus-4.7",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Answer this question using ONLY the documents below. 
For every claim, cite which document it came from using [doc_id].
If the answer is not in the documents, say "Not found in provided context."

Documents:
{context}

Question: {question}"""
        }]
    )
    
    answer = response.content[0].text
    
    # Extract cited documents
    cited_ids = set()
    import re
    for match in re.finditer(r'\[doc_(\w+)\]', answer):
        cited_ids.add(match.group(1))
    
    return {
        "answer": answer,
        "cited_documents": cited_ids,
        "all_retrieved": [d["id"] for d in docs]
    }

The key: explicit citation forces the model to stay grounded. If it cites a document ID that doesn’t exist in your retrieved set, your evaluation layer catches it immediately. This simple constraint reduces hallucinations by ~60% in practice because the model can’t just generate plausible-sounding text; it has to point to something.

Pattern 2: Validation Against Canonical Sources

Some questions have objectively correct answers you can verify post-hoc. If your LLM is answering questions about your API, your deployed API schema is ground truth. If it’s answering questions about product features, your product database is ground truth.

Set up a validation layer that runs after the model responds:

import json

def validate_api_claim(claim: str, actual_schema: dict) -> bool:
    """Check if model's claim about API matches reality."""
    validation_prompt = f"""Given this API schema:
{json.dumps(actual_schema, indent=2)}

Is this claim accurate?
Claim: {claim}

Respond with JSON: {{"valid": true/false, "reason": "..."}}"""
    
    validation = client.messages.create(
        model="claude-opus-4.7",
        max_tokens=200,
        messages=[{"role": "user", "content": validation_prompt}]
    )
    
    result = json.loads(validation.content[0].text)
    return result["valid"]

When validation fails, don’t return the answer. Return a flag that lets your application either:

Retrieve more context and retry
Escalate to a human
Return a safe fallback

This is less about perfect accuracy and more about stopping the model from confidently lying.

Pattern 3: Decompose Into Verifiable Sub-Questions

Complex claims often consist of smaller, verifiable components. Instead of asking the LLM one big question, break it into pieces:

def decompose_and_verify(user_question: str) -> dict:
    # Step 1: Ask the model to break the question into sub-questions
    decomposition = client.messages.create(
        model="claude-opus-4.7",
        max_tokens=300,
        messages=[{
            "role": "user",
            "content": f"""Break this into 2-3 concrete, checkable sub-questions:
{user_question}

Format: JSON list of strings, each a single verifiable claim."""
        }]
    ).content[0].text
    
    sub_questions = json.loads(decomposition)
    
    # Step 2: Answer each sub-question with grounding
    answers = []
    for sub_q in sub_questions:
        sub_answer = answer_with_grounding(sub_q)
        answers.append(sub_answer)
    
    # Step 3: Reassemble—but only if all sub-answers were properly grounded
    if all(answer["cited_documents"] for answer in answers):
        return {"status": "verified", "sub_answers": answers}
    else:
        return {"status": "incomplete_grounding", "sub_answers": answers}

When the LLM can’t ground part of the answer, you know which part is weak.

Pattern 4: Use Claude’s Extended Thinking for High-Stakes Claims

Claude Opus 4.8 with extended thinking mode forces the model to reason through claims step by step before answering. For questions where confidence is high but correctness matters:

response = client.messages.create(
    model="claude-opus-4.7",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "What changed in our API between v1 and v2? Show your reasoning."
    }]
)

# Extract thinking and answer separately
for block in response.content:
    if block.type == "thinking":
        print("Model's reasoning:", block.thinking)
    elif block.type == "text":
        print("Final answer:", block.text)

The reasoning block lets you inspect why the model arrived at an answer. Sometimes you’ll spot logical errors or unsupported leaps that indicate the model is confident but wrong. It’s not foolproof, but it’s better than getting only the final answer.

Pattern 5: Confidence Signals You Actually Control

Build your own confidence metric based on structural properties, not the model’s token probabilities:

def compute_answer_confidence(grounded_answer: dict) -> float:
    """Confidence based on answer structure, not model assertion."""
    score = 0.0
    
    # Is the answer grounded? (0-0.4)
    if grounded_answer["cited_documents"]:
        score += 0.4
    
    # Did retrieval return multiple matching documents? (0-0.3)
    retrieved_count = len(grounded_answer["all_retrieved"])
    score += min(0.3, retrieved_count / 5 * 0.3)
    
    # Did the model cite most of what we gave it? (0-0.3)
    cite_ratio = len(grounded_answer["cited_documents"]) / max(1, retrieved_count)

*Question via [Hacker News](https://news.ycombinator.com/item?id=47421107)*