Fugu-MT 論文翻訳(概要): Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation

論文の概要: Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation

arxiv url: http://arxiv.org/abs/2603.03306v1
Date: Sun, 08 Feb 2026 11:58:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-09 01:20:08.132518
Title: Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
Title（参考訳）: Token-Oriented Object Notation vs JSON: プレーンおよび制約付きデコード生成のベンチマーク
Authors: Ivan Matveev,
Abstract要約: Token-Oriented Object Notation (TOON) は、構造化データを LLM に転送するためのシリアライズフォーマットとして、トークンの使用量を大幅に削減することを目的としている。これをテストするために,構造的複雑性,検証,および平文生成と構造的出力の比較に関して,いくつかのテストケースを作成するベンチマークを行った。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a lack of tests against JSON generation. Though never present in training data, TOON syntax is simple enough to suggest one-shot in-context learning could support accurate generation. The inevitable prompt overhead can be an acceptable trade-off for shorter completions. To test this, we conducted a benchmark creating several test cases with regard to structural complexity, a validation pipeline, and comparing plain JSON generation vs structured output (via constrained decoding) JSON generation vs TOON one-shot in-context learning generation. JSON structured output was included to establish a minimum token budget baseline and to set a starting point for future experiments testing TOON constrained decoding inference enforcement. Key findings: TOON shows promising accuracy/token consumption ratio for in-domain generation tasks, though this advantage is often reduced by the "prompt tax" of instructional overhead in shorter contexts. Plain JSON generation shows the best one-shot and final accuracy, even compared with constrained decoding structured output, where the only significant advantage is the lowest token usage as a trade-off for slightly decreased accuracy overall and significant degradation for some models. Notably, for simple structures, this "lowest token usage" of constrained decoding outperformed even TOON, hinting that TOON enforcing via frameworks such as xgrammar may not yield the desired results. Furthermore, the results suggest a scaling hypothesis: TOON's true efficiency potential likely follows a non-linear curve, shining only beyond a specific point where cumulative syntax savings amortize the initial prompt overhead.
Abstract（参考訳）: 最近発表されたToken-Oriented Object Notation (TOON)は、構造化データをLLMに渡すシリアライズフォーマットとしてJSONを置き換えることを目標としている。 LLMの理解において確固たる精度を示す一方で、JSON生成に対するテストの欠如がある。トレーニングデータには存在しないが、TOON構文は、ワンショットのインコンテキスト学習が正確な生成をサポートすることを示唆するほど単純である。避けられないプロンプトのオーバーヘッドは、短い完了に対して許容できるトレードオフになります。これをテストするために、構造的複雑性、検証パイプライン、および通常のJSON生成と(制約付き復号化による)構造化された出力の比較と、TOONワンショットのインコンテキスト学習生成を比較して、いくつかのテストケースを作成するベンチマークを実行した。 JSON構造化された出力は、最小限のトークン予算ベースラインを確立し、TOON制約付きデコード推論の実行をテストする将来の実験の出発点を設定するために含まれた。主な発見: TOONはドメイン内生成タスクの有望な精度/トーケン消費率を示すが、この利点は短いコンテキストでの指導オーバーヘッドの「急激な税」によって減少することが多い。単純なJSON生成は、制約付きデコードされた構造化された出力と比較しても、最高のワンショットと最終的な精度を示している。特に、単純な構造の場合、制約付きデコーディングの「最も低いトークン使用」はTOONよりも優れており、xgrammarのようなフレームワークを介してTOONを強制することは望まれる結果をもたらすものではないことを示唆している。 TOONの真の効率ポテンシャルは非線形曲線に従う可能性があり、累積構文が初期プロンプトオーバーヘッドを償却する特定の点を越えてのみ輝く。

論文の概要: Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation

関連論文リスト