Fugu-MT 論文翻訳(概要): Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling

論文の概要: Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling

arxiv url: http://arxiv.org/abs/2601.21684v1
Date: Thu, 29 Jan 2026 13:18:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.841332
Title: Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling
Title（参考訳）: ロールアウトを無駄にしない - 効率的なテストタイムスケーリングのためのリサイクリング検索エクスペリエンス
Authors: Xinglin Wang, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Yiwei Li, Yueqi Zhang, Chuyi Tan, Ji Zhang, Boyuan Pan, Yao Hu, Kan Li,
Abstract要約: テストタイム検索のための自己指導型トレーニングフリー戦略である RSE を提案する。 RSEは、一連の独立した試行からテストタイムを累積的なプロセスに変える。 RSEは計算コストに匹敵する高いベースラインを一貫して上回る。
参考スコア（独自算出の注目度）: 37.83913102876393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-Time Scaling enhances the reasoning capabilities of Large Language Models by allocating additional inference compute to broaden the exploration of the solution space. However, existing search strategies typically treat rollouts as disposable samples, where valuable intermediate insights are effectively discarded after each trial. This systemic memorylessness leads to massive computational redundancy, as models repeatedly re-derive discovered conclusions and revisit known dead ends across extensive attempts. To bridge this gap, we propose \textbf{Recycling Search Experience (RSE)}, a self-guided, training-free strategy that turns test-time search from a series of isolated trials into a cumulative process. By actively distilling raw trajectories into a shared experience bank, RSE enables positive recycling of intermediate conclusions to shortcut redundant derivations and negative recycling of failure patterns to prune encountered dead ends. Theoretically, we provide an analysis that formalizes the efficiency gains of RSE, validating its advantage over independent sampling in solving complex reasoning tasks. Empirically, extensive experiments on HMMT24, HMMT25, IMO-Bench, and HLE show that RSE consistently outperforms strong baselines with comparable computational cost, achieving state-of-the-art scaling efficiency.
Abstract（参考訳）: テストタイムスケーリングは、追加の推論計算を割り当てて、ソリューション空間の探索を広げることで、大規模言語モデルの推論能力を高める。しかし、既存の検索戦略では、ロールアウトを使い捨てのサンプルとして扱うことが多い。この体系的な記憶の無さは膨大な計算冗長性をもたらし、モデルが繰り返し発見された結論を導出し、既知の致命的な終了を広範囲にわたる試みで再考する。このギャップを埋めるために、我々は、一連の独立した試行からテスト時間検索を累積的なプロセスに変換する自己誘導型トレーニングフリー戦略である、‘textbf{Recycling Search Experience(RSE)’を提案する。 RSEは、生の軌跡を共有経験バンクに積極的に蒸留することにより、中間結論の正のリサイクルを可能にし、冗長な導出をショートカットし、故障パターンの負のリサイクルを行い、遭遇した死端をプーンする。理論的には、RSEの効率向上を形式化し、複雑な推論タスクの解法において、独立サンプリングよりも有利であることを示す。実証的に、HMMT24, HMMT25, IMO-Bench, HLEに関する広範な実験により、RSEは一貫して高いベースラインを計算コストで上回り、最先端のスケーリング効率を達成することを示した。

論文の概要: Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling

関連論文リスト