Library of the Week — Letta — Stochastic Sandbox

Letta — stateful LLM agents with persistent memory, built for production

GitHub · Language: Python · License: Apache 2.0

What it does

Letta (formerly MemGPT) gives LLM agents long-term memory that persists across conversations — not just a context window hack, but a first-class memory layer with storage backends. It’s aimed at developers building agents that need to remember users, accumulate knowledge over time, or run autonomously for extended periods without losing state.

Why it stands out

Memory as a first-class primitive: agents manage their own memory via tool calls — writing to archival storage, searching recall memory, and editing core memory blocks rather than stuffing everything into a prompt
Stateful agent REST API out of the box: spin up the server and you get persistent agents with IDs, resumable across restarts, no DIY state management required
Multi-agent support: agents can message each other natively, enabling supervisor/subagent patterns without glue code
Model-agnostic: works with OpenAI (GPT-5.5, GPT-4.1 Nano), Anthropic (Claude Opus 4.8, Haiku 4.5), Gemini (3.5 Flash, 3.1 Pro), and local models via Ollama-compatible endpoints

Quick start

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

agent = client.agents.create(
    model="openai/gpt-5.5",
    memory_blocks=[
        {"label": "human", "value": "User is a backend developer who prefers concise answers."},
        {"label": "persona", "value": "I am a helpful assistant with excellent memory."},
    ],
)

response = client.agents.messages.create(
    agent_id=agent.id,
    input="What stack should I use for a high-throughput API?",
)

for message in response.messages:
    print(message)

# Later — same agent_id retains full memory across sessions

When to use it

User-personalized agents: chatbots or assistants that need to remember preferences, history, and context across many separate sessions
Long-running autonomous tasks: workflows where an agent accumulates knowledge incrementally — research assistants, coding agents that learn a codebase over time
Multi-agent systems where shared memory matters: pipelines where agents need to read/write a common knowledge store rather than just pass messages

When to skip it

Stateless RAG pipelines: if your use case is purely retrieval-augmented generation with no need for persistent agent identity, Letta adds overhead for no benefit — a leaner retrieval stack is simpler
Latency-sensitive applications: the memory tool-call loop adds round trips; if sub-second response time is critical, simpler context management will serve you better

The verdict

Letta solves a real architectural problem that most agent frameworks paper over — what happens to memory between sessions. The stateful server model is genuinely production-ready, and the memory abstraction is cleaner than rolling your own with a vector DB and session store. If you’re building agents that need to know users over time rather than just respond to them, Letta is the most thoughtfully designed solution in the current open-source ecosystem.