Series

Deep Dives

6 editions

Embeddings in Practice: Every Major Model Compared
March 31, 2026
omlx: Run Local LLMs on Apple Silicon with a RAG Customer Support App
omlx: macOS-native LLM server for Apple Silicon with SSD KV caching that cuts cold-start prefill from 90s to under 5s. Complete RAG customer support chatbot tutorial included.
March 28, 2026
Prompt Injection Prevention in Production
Taxonomy of prompt injection attacks and the layered defenses — input validation, output filtering, guardrails — that actually work at scale.
March 27, 2026
The Inference Stack Top to Bottom
What happens between your API call and a streamed token — routing, batching, KV cache, quantization, and speculative decoding explained.
March 27, 2026
MCP, Tool Use, and Function Calling: How Agents Actually Work in 2026
A comprehensive rundown of function calling, Model Context Protocol, agent frameworks, and the patterns that actually work in production — across every major provider.
March 25, 2026
API Rate Limits Compared: Every Major LLM Provider in One Place
A side-by-side comparison of rate limits across 15 LLM API providers — OpenAI, Anthropic, Google, Groq, xAI, DeepSeek, Mistral, Perplexity, Alibaba, Moonshot, and more — as of March 2026.
March 22, 2026