Fugu-MT 論文翻訳(概要): RAG over Thinking Traces Can Improve Reasoning Tasks

論文の概要: RAG over Thinking Traces Can Improve Reasoning Tasks

arxiv url: http://arxiv.org/abs/2605.03344v1
Date: Tue, 05 May 2026 04:03:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-06 19:35:43.757099
Title: RAG over Thinking Traces Can Improve Reasoning Tasks
Title（参考訳）: RAG over Thinking Traces can improves Reasoning Tasks
Authors: Negar Arabzadeh, Wenjie Ma, Sewon Min, Matei Zaharia,
Abstract要約: Retrieval-augmented Generation (RAG) は知識集約的なタスクに有効であることが証明されているが、推論集約的な問題に対して限られた利益をもたらすと広く信じられている。本稿では,問題解決の過程で発生する思考軌跡,すなわち中間的思考軌跡の検索を提案する。これらのトレースをコーパスとして使用すると、単純な検索列生成パイプラインは推論性能を一貫して改善する。
参考スコア（独自算出の注目度）: 45.57562898423325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Retrieval-augmented generation (RAG) has proven effective for knowledge-intensive tasks, but is widely believed to offer limited benefit for reasoning-intensive problems such as math and code generation. We challenge this assumption by showing that the limitation lies not in RAG itself, but in the choice of corpus. Instead of retrieving documents, we propose retrieving thinking traces, i.e., intermediate thinking trajectories generated during problem solving attempts. We show that thinking traces are already a strong retrieval source, and further introduce T3, an offline method that transforms them into structured, retrieval-friendly representations, to improve usability. Using these traces as a corpus, a simple retrieve-then-generate pipeline consistently improves reasoning performance across strong models and benchmarks such as AIME 2025--2026, LiveCodeBench, and GPQA-Diamond, outperforming both non-RAG baselines and retrieval over standard web corpora. For instance, on AIME, RAG with traces generated by Gemini-2-thinking achieves relative gains of +56.3%, +8.6%, and +7.6% for Gemini-2.5-Flash, GPT-OSS-120B, and GPT-5, respectively, even though these are more recent models. Interestingly, RAG on T3 also incurs little or no extra inference cost, and can even reduce inference cost by up to $15%$. Overall, our results suggest that thinking traces are an effective retrieval corpus for reasoning tasks, and transforming them into structured, compact, or diagnostic representations unlocks even stronger gains. Code available at https://github.com/Narabzad/t3.
Abstract（参考訳）: Retrieval-augmented Generation (RAG) は知識集約的なタスクに有効であることが証明されているが、数学やコード生成のような推論集約的な問題に対して限られた利益をもたらすと広く信じられている。我々は、制限がRAG自身ではなく、コーパスの選択にあることを示すことによって、この仮定に挑戦する。文書を検索する代わりに,問題解決の試み中に発生する中間的思考軌跡を検索する手法を提案する。我々は、すでに思考トレースが強力な検索源であることを示し、さらに、ユーザビリティを向上させるために、それらを構造化された検索フレンドリーな表現に変換するオフライン手法であるT3を紹介した。これらのトレースをコーパスとして使用すると、単純な検索列生成パイプラインは、AIME 2025-2026、LiveCodeBench、GPQA-Diamondといった強力なモデルとベンチマーク間の推論性能を一貫して改善し、RAG以外のベースラインと標準Webコーパスでの検索の両方を上回っている。例えば AIME では、Gemini-2-thinking で生成されたトレースを持つRAG は、より最近のモデルであるにもかかわらず、Gemini-2.5-Flash と GPT-OSS-120B と GPT-5 の相対的な増加率は +56.3%、 +8.6%、 +7.6% である。興味深いことに、RAG on T3は追加の推論コストをほとんど、あるいは全く発生せず、推論コストを最大で15%削減できる。以上の結果から,思考トレースはタスクの推論に有効な検索コーパスであり,それらを構造化された,コンパクトな,あるいは診断表現に変換することで,より強力な利得を達成できることが示唆された。コードはhttps://github.com/Narabzad/t3.comで公開されている。

論文の概要: RAG over Thinking Traces Can Improve Reasoning Tasks

関連論文リスト