Fugu-MT 論文翻訳(概要): FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

論文の概要: FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

arxiv url: http://arxiv.org/abs/2604.02967v1
Date: Fri, 03 Apr 2026 11:03:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.455367
Title: FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models
Title（参考訳）: エラーの森」が大きめの推論モデルでベストに
Authors: Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, Guojie Song,
Abstract要約: The First is The Bestの現象について検討し、代替ソリューションは単に最適ではないだけでなく、有害である可能性がある。本稿では,第1のソリューションにおけるFoE成長を抑制するRefining First,第2の2つのコンポーネントからなる自己誘導型効率的な推論フレームワークを提案する。
参考スコア（独自算出の注目度）: 10.994880611133548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring multiple alternative solutions. Upon closer inspection, however, we uncover a surprising phenomenon: The First is The Best, where alternative solutions are not merely suboptimal but potentially detrimental. This observation challenges widely accepted test-time scaling laws, leading us to hypothesize that errors within the reasoning path scale concurrently with test time. Through comprehensive empirical analysis, we characterize errors as a forest-structured Forest of Errors (FoE) and conclude that FoE makes the First the Best, which is underpinned by rigorous theoretical analysis. Leveraging these insights, we propose RED, a self-guided efficient reasoning framework comprising two components: I) Refining First, which suppresses FoE growth in the first solution; and II) Discarding Subs, which prunes subsequent FoE via dual-consistency. Extensive experiments across five benchmarks and six backbone models demonstrate that RED outperforms eight competitive baselines, achieving performance gains of up to 19.0% while reducing token consumption by 37.7% ~ 70.4%. Moreover, comparative experiments on FoE metrics shed light on how RED achieves effectiveness.
Abstract（参考訳）: 近年のDeepSeek-R1のようなLarge Reasoning Models(LRM)は、複雑な推論タスクにおいて顕著な成功を示し、複数の代替ソリューションを探索する上で、人間のようなパターンを示している。 The First is The Best – 代替ソリューションは、単に最適ではないだけでなく、有害な可能性がある。この観察は、テスト時間のスケーリング法則を広く受け入れることに挑戦し、推論経路内のエラーがテスト時間と同時にスケールするという仮説を立てる結果となった。総合的な実証分析を通じて,エラーを森林構造森林(FoE)として特徴付け,厳密な理論的分析を基盤としたFoEが最善を尽くしていると結論づける。これらの知見を生かして、第1のソリューションにおけるFoE成長を抑制するRefining Firstと、第2のソリューションにおけるFoEの成長を抑制するDisvearding Subsという2つのコンポーネントからなる自己誘導型効率的な推論フレームワークであるREDを提案する。 5つのベンチマークと6つのバックボーンモデルにわたる大規模な実験により、REDは8つの競争基準を上回り、最大19.0%のパフォーマンス向上を達成し、トークン消費を37.7%から70.4%削減した。さらに、FoEメトリクスの比較実験では、REDが有効性を達成する方法が明らかになりました。

論文の概要: FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

関連論文リスト