Office Hours — We're getting inconsistent outputs from the same prompt with GPT-5.4. Temperature is locked at 0. What's actually going on?

A daily developer question about AI/LLMs, answered with a direct, opinionated take.

Daily One question from the trenches, one opinionated answer.

We’re getting inconsistent outputs from the same prompt with GPT-5.4. Temperature is locked at 0. What’s actually going on?

Temperature at 0 should give you deterministic outputs, but you’re probably hitting one of three things. First, the model itself has some internal variance even at temperature 0 due to how the sampling works under the hood. OpenAI’s docs acknowledge this now, though it’s rare. Second, you might have different system prompts, different conversation histories, or slightly different input formatting between calls that you’re not noticing. Check your logs carefully, including whitespace and special characters.

Third, and most likely, you’re bumping into context window boundaries or the model is legitimately uncertain about the right answer and picking from a very tight distribution. At temperature 0, it’s still choosing the highest-probability token, but if the top few tokens have nearly identical scores, tiny differences in the input can shift which one wins.

Run a controlled test: make 50 identical API calls with the exact same input and log everything. If you’re still getting variance, file it with OpenAI support. If it disappears, your inputs weren’t as identical as you thought.

Bottom line: Lock down your inputs first (logs, whitespace, everything), run a controlled test to confirm the variance is real, then escalate if it persists.