Office Hours — Why don't we have a functional DSL for data+embedding+API pipelines?

Why don’t we have a functional DSL for data+embedding+API pipelines?

We kind of do, but they’re all fragmented and solving the wrong problems. You’ve got LangChain for orchestration, Hugging Face’s tools for embeddings, Pydantic for validation, FastAPI for serving. They work together but require gluing. The real blocker isn’t technical; it’s that the problem space keeps shifting faster than any DSL can stabilize.

A year ago, RAG was the pattern. Now it’s graph-RAG, then it’s multi-tier retrieval with rerankers. Last month people were obsessed with agentic architecture; this week they’re debugging why ReAct agents waste 90% of their retries on hallucinated tool calls, not recoverable errors. The tooling can’t keep pace because the fundamentals are still moving.

Why Generic Tools Fall Apart at Scale

The second issue is that “data+embedding+API pipelines” aren’t actually a coherent abstraction once you hit production. A real DSL would need to handle data freshness guarantees, embedding staleness, cache invalidation strategies, fallback models, inference cost constraints, and latency SLOs. All of which change based on your specific use case. Zapier tries this; it’s mostly decorative.

Consider a real example: a RAG pipeline serving customer support queries. You need embeddings refreshed when your knowledge base updates (daily? hourly?), but refreshing everything is expensive. You need the embedding model to stay consistent. Switching from text-embedding-3-small to a newer model invalidates cached vectors. You need fallback behavior when your primary LLM hits rate limits. You need per-query cost tracking because your SLA is “spend no more than $0.02 per query on inference.” A DSL that handles all of this generically either becomes incomprehensibly complex or forces you into constraints that don’t match your actual business requirements.

Different teams end up with different priority orderings: one team optimizes for latency (parallel embedding lookups plus streaming responses), another for cost (aggressive prompt caching plus batch processing), another for freshness (real-time embedding updates). The DSL that excels at one is a straitjacket for the others.

The Cost/Freshness Tradeoff

Here’s where it gets concrete. Suppose you’re running that support pipeline and embedding 100K articles daily. Text-embedding-3-small costs $0.02 per 1M tokens. Those 100K articles average 500 tokens each, so daily embedding runs cost roughly $1. But if you embed on every update (10-20 times daily in a busy org), you’re looking at $10-20/day just to keep vectors fresh. Most teams compromise: batch daily, accept 24-hour staleness, and live with the irrelevant results that produces on older knowledge.

Now add Claude Opus 4.7 for reranking at $15 per 1M input tokens. A single query might require reranking the top 50 results, averaging 200 tokens each, costing $0.15 per query before you even answer the user. Suddenly that $0.02 SLA looks optimistic. A generic DSL can’t resolve this tradeoff because the answer depends on whether your customers care more about freshness or cost. And that answer changes quarter to quarter as volume grows.

What Actually Works

Teams are standardizing around Python plus async patterns plus explicit state machines for their pipelines instead of fighting generic tools. Boring, but it works. Here’s roughly what that looks like in practice:

@dataclass(frozen=True)
class RetrievalResult:
    chunks: list[Chunk]
    embedding_model: str
    retrieved_at: datetime
    bm25_scores: list[float]
    semantic_scores: list[float]

async def retrieve(query: str, kb: KnowledgeBase) -> RetrievalResult:
    bm25, semantic = await asyncio.gather(
        kb.bm25_search(query, top_k=20),
        kb.semantic_search(query, top_k=20),
    )
    merged = merge_and_dedupe(bm25, semantic)
    reranked = await rerank(merged, query)  # falls back to score fusion if reranker fails
    return RetrievalResult(
        chunks=reranked,
        embedding_model=kb.embedding_model_version,
        retrieved_at=datetime.utcnow(),
        bm25_scores=[r.bm25 for r in reranked],
        semantic_scores=[r.semantic for r in reranked],
    )

The frozen dataclass matters. Immutable state flowing through the pipeline makes debugging tractable. When a query returns garbage, you can log the full RetrievalResult and replay the generation step without re-running retrieval. That alone is worth more than any orchestration framework’s built-in tracing.

For fallback logic, explicit Python with structured exceptions beats DSL configuration. If embedding fails, retry with exponential backoff, then degrade to keyword search. If your primary model is rate-limited, log the event and route to GPT-4.1 Nano automatically. This is more code than a declarative YAML file, but it’s also more testable and requires no new abstraction layer.

The frontier models themselves have gotten good enough at reasoning that you don’t need custom DSL syntax for the hard parts. If you need conditional logic in your pipeline, letting the model decide (“if this query is about billing, fetch from the finance KB; otherwise use the general KB”) beats building a declarative rule engine. Claude Opus 4.7 and GPT-5.5 handle that kind of routing reliably enough that the logic doesn’t need to live in config files.

One edge case worth flagging: embedding model versioning. Most teams treat the embedding model as infrastructure and forget to version-pin it. When the provider silently updates the model or you migrate to a newer one, your stored vectors become incompatible with new query embeddings. Store the model identifier alongside every vector. Make the mismatch a hard error, not a silent degradation. This is the kind of operational detail no DSL encodes correctly because it requires an opinion about your deployment process.

Bottom line: Stop waiting for the DSL. Build your pipeline as typed Python with Pydantic models and async I/O, keep embeddings and API calls separate, version-pin your embedding models explicitly, and lean on Claude Opus 4.7 or GPT-5.5 for the reasoning parts where you’d otherwise reach for custom DSL syntax. Your bottleneck is data quality or retrieval precision, not plumbing.

Question via Hacker News