The Stack — Klarna's AI Assistant A technical teardown of Klarna's AI Assistant: the models, infrastructure, and engineering decisions behind the product. 2026-05-11T12:00:00.000Z The Stack The Stack architectureteardownai-products

The Stack — Klarna's AI Assistant

A technical teardown of Klarna's AI Assistant: the models, infrastructure, and engineering decisions behind the product.

Reverse-engineering the architecture behind real AI products.

Klarna’s AI Assistant is the customer service automation layer that replaced 700 human agents — and then had to explain why it kept hiring them back.

What It Is

Klarna’s in-app AI assistant handles customer service conversations across refunds, disputes, payment plan changes, and order tracking for roughly 85 million consumers in 45 markets. It launched publicly in early 2024 and became one of the most cited enterprise deployments of conversational AI in financial services. Klarna’s own communications team turned it into a marketing story — which means the engineering constraints are unusually visible.

The Architecture

Klarna has publicly confirmed they’re an OpenAI customer and the original assistant was built on top of OpenAI’s API rather than self-hosted models. Given the latency requirements of a chat interface and the regulatory complexity of operating across 45 markets with different consumer protection laws, the API-first approach makes sense here: Klarna gets model updates without managing infrastructure, and OpenAI’s uptime SLAs remove one class of operational risk in a payments context. The specific model version powering the assistant is not publicly disclosed at the current generation.

The core of the system is almost certainly RAG-heavy rather than fine-tuned. Klarna’s customer service surface is enormous — return policies vary by merchant, dispute resolution rules vary by jurisdiction, and payment plan logic is transactional state that changes per-user. You can’t bake that into weights. What you can do is build a retrieval layer that pulls the right policy document, the right merchant ruleset, and the right account state at query time, then let the model reason over a well-constructed context window. Klarna has talked publicly about integrating the AI with their backend systems, which confirms the assistant is doing live data lookups rather than operating off static knowledge.

On the infrastructure side, Klarna appears to use a routing layer that distinguishes between fully automatable requests (track my order, change payment date) and escalation cases (fraud disputes, complex merchant conflicts). This is a common pattern in enterprise support automation — route high-confidence, low-stakes queries to full automation, route ambiguous or high-stakes queries to a human-in-loop flow. The fact that Klarna publicly walked back some of their “AI replaced 700 agents” claims suggests the escalation rate on complex queries remained higher than their initial projections.

Cost control is clearly a first-order concern in a payments business operating at consumer scale. The likely approach combines aggressive caching of common policy lookups, tiered context window management (don’t send 10k tokens when 800 will do), and model-tier routing — faster, cheaper frontier tiers for simple account queries, more capable models for dispute reasoning. The company has not published inference cost figures.

The Smart Decision

Klarna built the assistant inside their existing app rather than as a separate surface, which sounds obvious but has significant architectural consequences. By embedding the assistant in an authenticated session, they get access to the full customer account graph at query time: transaction history, active payment plans, dispute status, linked merchants. This means the model is never reasoning in the abstract — it’s reasoning about this customer’s specific situation.

That design choice eliminates an entire class of hallucination risk. A generic financial AI assistant answering “can I change my payment date” has to either hedge or guess. Klarna’s assistant can look up whether you have a payment due, when it is, and whether your account is eligible for a change — and then give you a direct answer. The RAG isn’t just policy retrieval, it’s transactional state retrieval. That’s a harder system to build but it’s why the deflection rate they’ve cited is plausible for the query types it handles well.

The Tradeoff

Klarna made a bet on a single external model provider at a scale that creates meaningful vendor dependency. The entire automation layer runs on API calls to OpenAI, which means model deprecations, pricing changes, and outages are infrastructure events for Klarna’s customer service operation, not just product inconveniences. When OpenAI retired older model versions earlier this year, any company on those endpoints had to re-evaluate and potentially re-test prompt behavior on the successor models.

This tradeoff also affects what Klarna can fine-tune or specialize. Operating through the standard API means limited ability to adjust model behavior at the weight level — they’re working within the constraint of prompt engineering and retrieval design rather than training a model on their specific dispute resolution patterns or tone requirements. A company like Klarna, with enormous volumes of labeled customer service interactions, likely has the data to fine-tune effectively. The question is whether the engineering cost and compliance overhead of running that infrastructure is worth it versus continued API consumption. Based on public information, they haven’t made that move.

What You Can Steal

  • Authenticated context beats generic context. If your users are logged in, the single highest-leverage thing you can do is inject their actual account state into the model context. It narrows the solution space and cuts hallucination risk simultaneously.

  • Model routing on query complexity is cheaper than you think and more important than it looks. Classify incoming queries into tiers before they hit your most capable (and expensive) model. Simple lookup queries don’t need frontier reasoning — they need fast retrieval and a clean answer.

  • Don’t fine-tune first. Klarna’s system appears to get significant mileage from retrieval and state injection without custom model training. RAG with well-structured policy documents is faster to iterate, easier to audit for regulatory compliance, and updatable without a training run.

  • Your escalation design is your actual product. The queries that break your automation reveal your real coverage gaps. Build the escalation path first, instrument it heavily, and let failure data drive where you invest next — whether that’s better retrieval, better prompts, or a harder conversation about what shouldn’t be automated.

  • Be careful about the headcount story. Klarna’s public framing created a PR cycle they had to partially reverse. If you’re deploying at scale in a regulated domain, under-claim deflection rates externally while measuring them rigorously internally. The gap between “AI handles X% of volume” and “AI resolves X% of cases” is where the real engineering work lives.