Paper of the Week — TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments TSCG shows small LLMs (4–14B) drop tool-call failures by compiling JSON schemas into natural-language descriptions before inference. 2026-05-07T12:00:00.000Z Paper of the Week Paper of the Week researchpapersarxivpractical-ai

Paper of the Week — TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

TSCG shows small LLMs (4–14B) drop tool-call failures by compiling JSON schemas into natural-language descriptions before inference.

Weekly One research paper, broken down for people who build things.

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

Furkan Sakizli. Published 2026-05-07. arXiv:2605.04107

One sentence summary

Compiling JSON tool schemas into structured natural-language descriptions at deploy time — rather than passing raw JSON at runtime — cuts tool-use failure rates dramatically for small models in production agent frameworks.

Why this paper

Most teams reach for GPT-5.4 or Claude Opus 4.6 to handle reliable tool calling, but cost and latency push toward smaller models for high-throughput workloads. This paper directly addresses the failure mode that makes 4–14B models frustrating to deploy as agents.

What they did

Production agent frameworks (OpenAI Function Calling, Anthropic Tool Use, MCP) pass tool schemas as raw JSON — a format optimized for machine parsing, not LLM interpretation. Small models struggle to correctly map intent to the right tool and arguments when reading JSON schemas cold. The paper proposes TSCG: a compilation step that transforms JSON schemas into deterministic, natural-language tool descriptions with explicit argument contracts, injected once at system-prompt construction rather than repeated per-call.

Key findings

  • For 4B–14B parameter models, raw JSON schema transmission accounts for the majority of tool-use failures in benchmarked production-style tasks
  • TSCG-compiled descriptions reduced malformed tool calls by over 60% on the tested task suite compared to passing raw JSON schemas directly
  • Performance gains were largest on tools with nested or optional parameters — exactly the schemas that trip up small models most
  • Compiled descriptions added negligible token overhead versus equivalent JSON (often fewer tokens due to pruning redundant structural syntax)
  • Results held across OpenAI Function Calling, Anthropic Tool Use, and MCP-style interfaces without framework-specific tuning

Why it matters for practitioners

If you’re routing agent traffic through a small model for cost reasons, the JSON-schema-to-LLM impedance mismatch is likely your biggest reliability bottleneck — not the model’s reasoning ability. TSCG is a compile-time fix, meaning you absorb the transformation cost once at deployment rather than paying it on every inference call.

What you can use today

  • Audit your current tool schemas: convert JSON to a structured prose format (“Tool: search_web. Purpose: … Required args: query (string) … Optional args: max_results (int, default 5)”) and test whether your small model’s call accuracy improves before changing anything else
  • Apply this pattern at system-prompt construction in LangGraph or your MCP server layer — generate the compiled description once when the agent initializes, not inside the per-turn prompt assembly
  • Prioritize compiled descriptions for tools with nested objects or anyOf / oneOf schemas; flat single-argument tools see smaller gains and may not be worth the added complexity