Paper of the Week — Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data

Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data

Andresa Rodrigues de Campos, David Lee, Imry Kissos, Piyush Paritosh. Published 2026-03-19. arXiv:2604.13066

One sentence summary

LLMs can learn an encoding dictionary from a few in-context examples and then operate directly on compressed representations — achieving lossless token reduction with no fine-tuning, no external tools, and no accuracy loss.

Why this paper

Token costs are still the dominant variable expense in production LLM pipelines. Any lossless approach to compression that works purely in-context — without touching model weights — is immediately deployable.

What they did

Many real-world LLM workloads involve repeated structure: log files, CSVs, templated records, API traces. The authors tested whether you can teach an LLM, via a few in-context examples, to map a custom encoding dictionary (e.g. short codes for repeated strings or phrases), then perform analysis tasks directly on the encoded input rather than the raw text. The key claim is that the encoding step is lossless — no semantic information is discarded — because the model holds the decoding key in its context window.

Key findings

Dictionary-encoded inputs achieved significant token reduction on repetitive corpora, with savings scaling proportionally to repetition rate in the data
LLMs successfully performed downstream analysis tasks (classification, extraction, aggregation) directly on encoded representations without decoding first
Accuracy on analytical tasks was preserved compared to uncompressed baselines
The approach requires no fine-tuning — the dictionary is provided entirely in the prompt preamble
Compression gains compound when multiple repeated patterns are encoded, making the technique most powerful on structured or semi-structured data

Why it matters for practitioners

If you’re running batch analytics over logs, user records, time-series annotations, or any corpus with repeated structure, this technique can cut your token spend without sacrificing accuracy or requiring model changes. It’s also composable with other cost-reduction strategies like caching or chunking, and works against any frontier model API today.

What you can use today

Identify your highest-repetition fields (status codes, template strings, enumerated values, common phrases) and build a compact dictionary; prepend it as part of your system prompt with a few decoded/encoded examples before sending encoded batches
Test the approach on your own data by measuring token counts before and after encoding with tiktoken or equivalent tokenizer tooling — the savings are easy to quantify before committing to a pipeline change
Combine with structured output prompting (JSON mode, etc.) where field names are themselves repeated tokens — those are often the lowest-hanging fruit for this technique