Fugu-MT 論文翻訳(概要): The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

論文の概要: The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

arxiv url: http://arxiv.org/abs/2606.20898v1
Date: Thu, 18 Jun 2026 19:49:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 12:05:57.211057
Title: The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications
Title（参考訳）: 先天的精度のトークン税--文書型生成AIアプリケーションにおけるRAGと長期的アーキテクチャの比較
Authors: Austin Hamilton, Ryan Singh, Michael Wise, Ibrahim Yousif, Arthur Carvalho, Zhe Shan, Mohammad Mayyas, Lora A. Cavuoto, Fadel M. Megahed,
Abstract要約: 本稿では, (a) 関連パスを検索する検索拡張生成(RAG) と, (b) 文書コレクション全体をコンテキストでロードする長文プロンプトの2つの基盤アーキテクチャを比較した。専門家検証ベンチマークを用いて,3つのマシン,2つの小言語モデル,3つの検索・イン・コンテクスト・プロンプトアプローチを用いて,972の回答を評価した。長いコンテキストのプロンプトは最も正確(セマンティックRAGでは73.1%対65.4%)であるが、クエリ当たりのトークンコストの26倍である。
参考スコア（独自算出の注目度）: 3.566534817171158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document-grounded assistants built on large language models are increasingly used in high-stakes, knowledge-intensive work. Their usefulness, however, may depend on how evidence is allocated before generation. We investigate such a claim by comparing two grounding architectures: (a) retrieval-augmented generation (RAG) that retrieves a few relevant passages, and (b) long-context prompting, which loads the whole document collection in context. We view these as two regimes of "epistemic access" on an accuracy--cost frontier. We use "epistemic accuracy" to capture model correctness that depends on having the right evidence. We posit that broader access (via long context) can increase it, but with a "token tax" (i.e., a substantial increase in cost due to larger input token consumption). We probe this framing with a case study in manufacturing safety training. Using an expert-validated benchmark, we evaluate 972 answers across three machines, two small language models, and three retrieval/in-context prompting approaches. Long-context prompting achieved the highest correctness (73.1% vs. 65.4% for semantic RAG), but at 26 times the per-query token cost. We interpret this gap as the token tax of broader evidentiary access. We carefully discuss the implications of our findings for resource-constrained organizations.
Abstract（参考訳）: 大規模言語モデル上に構築されたドキュメント基底アシスタントは、高い知識集約的な作業にますます利用されている。しかし、それらの有用性は、どのように証拠が生成前に割り当てられるかによって異なるかもしれない。 2つの基礎アーキテクチャを比較することで、このような主張を考察する。 (a)いくつかの関連通路を検索する検索増強世代(RAG) b) コンテキスト内でドキュメントコレクション全体をロードするロングコンテキストプロンプト。われわれはこれらを、精度の高いフロンティアにおける「緊急アクセス」の2つのレジームと見なしている。私たちは、正しい証拠を持つことに依存するモデルの正しさを捉えるのに、" atistemic accuracy"を使用します。より広範なアクセスを(長期的コンテキストを通じて)増やすことができると仮定するが、"トークン税(token tax)"(入力トークン消費の増大によるコストの大幅な増加)がある。我々はこのフレーミングを、製造安全訓練のケーススタディで調査する。専門家検証ベンチマークを用いて,3つのマシン,2つの小言語モデル,3つの検索・イン・コンテクスト・プロンプトアプローチを用いて,972の回答を評価した。長文のプロンプトは最も正確性が高い(セマンティックRAGでは73.1%対65.4%)が、クエリ当たりのトークンコストの26倍である。我々はこのギャップを、より広い情報アクセスのトークン税と解釈する。資源に制約のある組織に対する研究結果の意義を慎重に検討する。

論文の概要: The Token Tax of Epistemic Accuracy: Comparing RAG and Long-Context Architectures for Document-Grounded Generative AI Applications

関連論文リスト