Where randomness meets reason
Tag
7 posts
TSCG shows small LLMs (4–14B) drop tool-call failures by compiling JSON schemas into natural-language descriptions before inference.
Rewriting tool descriptions at deployment time—not training time—can recover 20-40% of function-calling accuracy lost to poorly written API docs.
Visualizing LLM output distributions reveals hidden modes, edge cases, and prompt sensitivity that single-sample evaluation completely misses.
Lossless prompt compression via dictionary encoding lets LLMs analyze repeated data at a fraction of token cost — no external tools, just in-context learning.
Structured state-space models finally beat transformers at document retrieval — here's what the Mamba-based RAG benchmark actually shows.
KV cache compression that cuts memory 40–60% with under 1% accuracy loss — here's the technique your inference stack probably isn't using yet.
SCoRe trains a single LLM to catch and fix its own mistakes via RL — 15.6% better on math, 9.1% on code, no multi-model pipeline needed.