Builders Spotlight — Haystack — Stochastic Sandbox

Haystack

An end-to-end framework for building production search and retrieval-augmented generation (RAG) pipelines, built by deepset.

The problem it set out to solve

Most teams building RAG systems in 2020-2021 were stitching together incompatible pieces: vector databases, retrievers, rerankers, language models, each with different interfaces and assumptions. There was no coherent way to wire them together, test them, or move from prototype to production. deepset saw teams constantly reinventing the same plumbing—and getting it wrong in subtle ways that only surfaced at scale.

The key insight

Haystack’s core philosophy is that RAG is a pipeline problem, not a library problem. Rather than optimizing for a single retrieval strategy or embedding model, Haystack treats the entire flow—document ingestion, embedding, retrieval, reranking, generation—as a directed graph where nodes (components) are composable and reusable. This unlocked the ability to swap out pieces without rewriting logic, and to build genuinely reproducible retrieval systems where you can version, debug, and iterate on each stage independently.

This approach meant prioritizing interfaces over implementations: a retriever is just something that implements a retrieval interface, whether it’s BM25, dense vector search, or hybrid. A reranker is anything that scores and ranks. This abstraction, borrowed from data engineering practices, made Haystack feel less like a “framework you’re locked into” and more like a toolkit you control.

How it works (in plain terms)

Haystack 2.0 introduced a declarative pipeline system where you define nodes (retrievers, embedders, LLMs, etc.) and explicitly connect them with inputs and outputs. The framework handles orchestration, serialization, and execution. You can run pipelines locally for development, serialize them to YAML for reproducibility, and deploy them to production without code changes. Under the hood, Haystack normalizes the APIs of different backends—Elasticsearch, Weaviate, Pinecone, local SQLite—so you can swap a vector database without touching your pipeline definition.

What it looks like in practice

from haystack import Pipeline
from haystack.components.retrievers.document_store import DocumentStoreRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

pipeline = Pipeline()
pipeline.add_component("retriever", DocumentStoreRetriever(document_store=ds))
pipeline.add_component("prompt_builder", PromptBuilder(template="..."))
pipeline.add_component("llm", OpenAIGenerator(api_key="..."))

pipeline.connect("retriever.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.prompt")

result = pipeline.run({"retriever": {"query": "How does X work?"}})

Why it matters

Reproducibility for retrieval: Because pipelines are declarative and versionable, you can iterate on retrieval strategies without touching application code—and track what changed and why.
Production-ready defaults: Haystack handles caching, batching, async execution, and error handling automatically, so teams don’t ship brittle RAG systems that break at scale.
Reduced lock-in: The component abstraction means switching from Elasticsearch to a vector DB, or from OpenAI to a local model, is a config change, not a rewrite.

Where to go next

Haystack GitHub — main repository with examples and integrations
Official Docs & Tutorials — well-maintained getting-started guides and API reference
“Building Production RAG Systems” blog series — deepset’s practical guides on evaluation, hybrid search, and scaling