Deep Dives
6 editions
- Embeddings in Practice: Every Major Model Compared
- omlx: Run Local LLMs on Apple Silicon with a RAG Customer Support App
omlx: macOS-native LLM server for Apple Silicon with SSD KV caching that cuts cold-start prefill from 90s to under 5s. Complete RAG customer support chatbot tutorial included.
- Prompt Injection Prevention in Production
Taxonomy of prompt injection attacks and the layered defenses — input validation, output filtering, guardrails — that actually work at scale.
- The Inference Stack Top to Bottom
What happens between your API call and a streamed token — routing, batching, KV cache, quantization, and speculative decoding explained.
- MCP, Tool Use, and Function Calling: How Agents Actually Work in 2026
A comprehensive rundown of function calling, Model Context Protocol, agent frameworks, and the patterns that actually work in production — across every major provider.
- API Rate Limits Compared: Every Major LLM Provider in One Place
A side-by-side comparison of rate limits across 15 LLM API providers — OpenAI, Anthropic, Google, Groq, xAI, DeepSeek, Mistral, Perplexity, Alibaba, Moonshot, and more — as of March 2026.