Library of the Week — Letta
A weekly teardown of one open-source AI/ML library: what it does, why it stands out, and when to use it.
Letta — stateful LLM agents with persistent memory, built for production
GitHub · Language: Python · License: Apache 2.0
What it does
Letta (formerly MemGPT) gives LLM agents long-term memory that persists across conversations — not just a context window hack, but a first-class memory layer with storage backends. It’s aimed at developers building agents that need to remember users, accumulate knowledge over time, or run autonomously for extended periods without losing state.
Why it stands out
- Memory as a first-class primitive: agents manage their own memory via tool calls — writing to archival storage, searching recall memory, and editing core memory blocks rather than stuffing everything into a prompt
- Stateful agent REST API out of the box: spin up the server and you get persistent agents with IDs, resumable across restarts, no DIY state management required
- Multi-agent support: agents can message each other natively, enabling supervisor/subagent patterns without glue code
- Model-agnostic: works with OpenAI (GPT-5.5, GPT-4.1 Nano), Anthropic (Claude Opus 4.8, Haiku 4.5), Gemini (3.5 Flash, 3.1 Pro), and local models via Ollama-compatible endpoints
Quick start
from letta_client import Letta
import os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
agent = client.agents.create(
model="openai/gpt-5.5",
memory_blocks=[
{"label": "human", "value": "User is a backend developer who prefers concise answers."},
{"label": "persona", "value": "I am a helpful assistant with excellent memory."},
],
)
response = client.agents.messages.create(
agent_id=agent.id,
input="What stack should I use for a high-throughput API?",
)
for message in response.messages:
print(message)
# Later — same agent_id retains full memory across sessions
When to use it
- User-personalized agents: chatbots or assistants that need to remember preferences, history, and context across many separate sessions
- Long-running autonomous tasks: workflows where an agent accumulates knowledge incrementally — research assistants, coding agents that learn a codebase over time
- Multi-agent systems where shared memory matters: pipelines where agents need to read/write a common knowledge store rather than just pass messages
When to skip it
- Stateless RAG pipelines: if your use case is purely retrieval-augmented generation with no need for persistent agent identity, Letta adds overhead for no benefit — a leaner retrieval stack is simpler
- Latency-sensitive applications: the memory tool-call loop adds round trips; if sub-second response time is critical, simpler context management will serve you better
The verdict
Letta solves a real architectural problem that most agent frameworks paper over — what happens to memory between sessions. The stateful server model is genuinely production-ready, and the memory abstraction is cleaner than rolling your own with a vector DB and session store. If you’re building agents that need to know users over time rather than just respond to them, Letta is the most thoughtfully designed solution in the current open-source ecosystem.