Library of the Week — Marimo — Stochastic Sandbox

Marimo — a reactive Python notebook that treats cells as functions, not scripts

GitHub · Language: Python · License: Apache 2.0

What it does

Marimo reimagines the Python notebook as a directed acyclic graph: when you edit a cell, only the downstream cells that depend on it re-execute automatically. For AI/ML developers, this eliminates the classic “did I run cell 12 before cell 7?” footgun that plagues prototype-to-production pipelines and LLM evaluation dashboards.

Why it stands out

No hidden state. Marimo statically analyzes variable references across cells, so execution order is deterministic. Jupyter notebooks famously aren’t — and that gap bites hardest in eval loops and fine-tuning experiments.
Notebooks are valid Python files. A .py marimo file can be run as a script, deployed as a web app, or imported as a module — no nbconvert step, no stripped outputs cluttering your git diff.
Built-in reactive UI widgets. Sliders, dropdowns, and dataframe viewers are first-class; changing a slider value re-runs dependent cells. This makes building quick prompt-tuning dashboards or RAG parameter explorers genuinely fast.
SQL cells with DuckDB integration. Query a DataFrame or local file with SQL inline, which pairs naturally with embedding/retrieval experiments where you want to slice results quickly.

Quick start

import marimo as mo
import openai

# Each cell is a Python function under the hood
prompt = mo.ui.text(placeholder="Enter a prompt...", label="Prompt")
prompt  # Displaying the widget makes it reactive

# This cell auto-reruns when `prompt` changes
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[{"role": "user", "content": prompt.value}],
)
mo.md(response.choices[0].message.content)

When to use it

LLM prototyping and eval dashboards where you want interactive controls (temperature sliders, model dropdowns) without wiring up a full Streamlit app.
Reproducible experiment notebooks checked into git — clean diffs and deterministic execution make code review on fine-tuning runs or prompt ablations actually tractable.
Sharing internal tools with teammates who aren’t running Jupyter servers; marimo run notebook.py serves it as a standalone web app instantly.

When to skip it

Heavy collaborative editing — real-time multiplayer is still limited compared to Deepnote or Hex if your team lives in shared notebooks.
If your workflow is built around nbformat (.ipynb) tooling — papermill, nbconvert pipelines, existing CI that parses notebook JSON — migration friction is real.

A note on deployment

marimo run notebook.py is a great way to share an internal tool, but keep it behind auth or on a trusted network. CVE-2026-39987 (April 2026) was an unauthenticated RCE in the terminal WebSocket endpoint that was exploited in the wild within ten hours of disclosure. The fix is shipped, but the broader lesson holds: marimo’s “serve a notebook as a web app” path is a real surface, not a toy. Upgrade to the patched release and don’t expose the server directly to the public internet.

The verdict

Marimo fixes the correctness problem that makes Jupyter notebooks quietly dangerous for ML work: hidden execution state. If you’re building LLM eval harnesses, prompt exploration tools, or any notebook you’ll eventually run headlessly in CI, Marimo is the right foundation. It won’t replace Jupyter overnight in mature codebases, but for greenfield AI/ML projects it’s the notebook environment I’d reach for first.