Series

Deep Dives

6 editions

  1. Embeddings in Practice: Every Major Model Compared

  2. omlx: Run Local LLMs on Apple Silicon with a RAG Customer Support App

    omlx: macOS-native LLM server for Apple Silicon with SSD KV caching that cuts cold-start prefill from 90s to under 5s. Complete RAG customer support chatbot tutorial included.

  3. Prompt Injection Prevention in Production

    Taxonomy of prompt injection attacks and the layered defenses — input validation, output filtering, guardrails — that actually work at scale.

  4. The Inference Stack Top to Bottom

    What happens between your API call and a streamed token — routing, batching, KV cache, quantization, and speculative decoding explained.

  5. MCP, Tool Use, and Function Calling: How Agents Actually Work in 2026

    A comprehensive rundown of function calling, Model Context Protocol, agent frameworks, and the patterns that actually work in production — across every major provider.

  6. API Rate Limits Compared: Every Major LLM Provider in One Place

    A side-by-side comparison of rate limits across 15 LLM API providers — OpenAI, Anthropic, Google, Groq, xAI, DeepSeek, Mistral, Perplexity, Alibaba, Moonshot, and more — as of March 2026.