Paper of the Week — Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data
Lossless prompt compression via dictionary encoding lets LLMs analyze repeated data at a fraction of token cost — no external tools, just in-context learning.
Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data
Andresa Rodrigues de Campos, David Lee, Imry Kissos, Piyush Paritosh. Published 2026-03-19. arXiv:2604.13066
One sentence summary
LLMs can learn an encoding dictionary from a few in-context examples and then operate directly on compressed representations — achieving lossless token reduction with no fine-tuning, no external tools, and no accuracy loss.
Why this paper
Token costs are still the dominant variable expense in production LLM pipelines. Any lossless approach to compression that works purely in-context — without touching model weights — is immediately deployable.
What they did
Many real-world LLM workloads involve repeated structure: log files, CSVs, templated records, API traces. The authors tested whether you can teach an LLM, via a few in-context examples, to map a custom encoding dictionary (e.g. short codes for repeated strings or phrases), then perform analysis tasks directly on the encoded input rather than the raw text. The key claim is that the encoding step is lossless — no semantic information is discarded — because the model holds the decoding key in its context window.
Key findings
- Dictionary-encoded inputs achieved significant token reduction on repetitive corpora, with savings scaling proportionally to repetition rate in the data
- LLMs successfully performed downstream analysis tasks (classification, extraction, aggregation) directly on encoded representations without decoding first
- Accuracy on analytical tasks was preserved compared to uncompressed baselines
- The approach requires no fine-tuning — the dictionary is provided entirely in the prompt preamble
- Compression gains compound when multiple repeated patterns are encoded, making the technique most powerful on structured or semi-structured data
Why it matters for practitioners
If you’re running batch analytics over logs, user records, time-series annotations, or any corpus with repeated structure, this technique can cut your token spend without sacrificing accuracy or requiring model changes. It’s also composable with other cost-reduction strategies like caching or chunking, and works against any frontier model API today.
What you can use today
- Identify your highest-repetition fields (status codes, template strings, enumerated values, common phrases) and build a compact dictionary; prepend it as part of your system prompt with a few decoded/encoded examples before sending encoded batches
- Test the approach on your own data by measuring token counts before and after encoding with
tiktokenor equivalent tokenizer tooling — the savings are easy to quantify before committing to a pipeline change - Combine with structured output prompting (JSON mode, etc.) where field names are themselves repeated tokens — those are often the lowest-hanging fruit for this technique