Fugu-MT 論文翻訳(概要): R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

論文の概要: R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

arxiv url: http://arxiv.org/abs/2510.08189v1
Date: Thu, 09 Oct 2025 13:16:22 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-10 17:54:15.0889
Title: R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
Title（参考訳）: R-Horizon:あなたの大きな推論モデルは、どれくらいの速度でブレッドスと深さで動くのか?
Authors: Yi Lu, Jianing Wang, Linsen Guo, Wei He, Hongyin Tang, Tao Gui, Xuanjing Huang, Xuezhi Cao, Wei Wang, Xunliang Cai,
Abstract要約: R-HoriZONは、Large Reasoning Models(LRMs)における長い水平推論挙動を刺激するために設計された方法である R-HoriZONに基づいて、長い推論地平線にまたがる相互依存問題を伴う複雑な多段階推論タスクを含む、長期水平推論ベンチマークを構築する。分析の結果, LRMは有効推論長が限られており, 複数の問題に対する思考予算の配分に苦慮していることが明らかとなった。
参考スコア（独自算出の注目度）: 63.51955244144878
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek-R1) have led to remarkable improvements through long Chain-of-Thought (CoT). However, existing benchmarks mainly focus on immediate, single-horizon tasks, failing to adequately evaluate models' ability to understand and respond to complex, long-horizon scenarios. To address this incomplete evaluation of Large Reasoning Models (LRMs), we propose R-HORIZON, a method designed to stimulate long-horizon reasoning behaviors in LRMs through query composition. Based on R-HORIZON, we construct a long-horizon reasoning benchmark, comprising complex multi-step reasoning tasks with interdependent problems that span long reasoning horizons. Through comprehensive evaluation of LRMs using the R-HORIZON benchmark, we find that even the most advanced LRMs suffer significant performance degradation. Our analysis reveals that LRMs exhibit limited effective reasoning length and struggle to allocate thinking budget across multiple problems appropriately. Recognizing these limitations, we use R-HORIZON to construct long-horizon reasoning data for reinforcement learning with verified rewards (RLVR). Compared to training with single-horizon data, RLVR with R-HORIZON not only substantially improves performance on the multi-horizon reasoning tasks, but also promotes accuracy on standard reasoning tasks, with an increase of 7.5 on AIME2024. These results position R-HORIZON as a scalable, controllable, and low-cost paradigm for enhancing and evaluating the long-horizon reasoning capabilities of LRMs.
Abstract（参考訳）: 推論モデル(例:OpenAI o1、DeepSeek-R1)のテスト時間スケーリングの最近の傾向は、長いチェーン・オブ・ソート(CoT)を通じて顕著に改善されている。しかし、既存のベンチマークは主に直感的な単一水平のタスクに焦点を当てており、複雑な長期水平のシナリオを理解し、応答するモデルの能力を適切に評価することができない。本研究では,Large Reasoning Models (LRMs) のこの不完全な評価に対処するため,問い合わせ合成によるLRMの長距離推論行動の促進を目的としたR-HORIZONを提案する。 R-HoriZONに基づいて、長い推論地平線にまたがる相互依存問題を伴う複雑な多段階推論タスクを含む、長期水平推論ベンチマークを構築する。 R-HoriZONベンチマークによるLRMの総合評価により,最も先進的なLRMでも性能劣化が著しいことがわかった。分析の結果, LRMは有効推論長が限られており, 複数の問題に対する思考予算の配分に苦慮していることが明らかとなった。これらの制約を認識し、R-Horizoonを用いて、証明された報酬(RLVR)を用いた強化学習のための長期水平推論データを構築する。シングルホライゾンデータを用いたトレーニングと比較して,R-HoriZONを用いたRLVRは,マルチホライゾン推論タスクの性能を大幅に向上するだけでなく,標準推論タスクの精度も向上し,AIME2024では7.5が増加した。これらの結果から, R-Horizoonは, LRMの長期的推論能力を向上・評価するための, スケーラブルで制御可能で低コストなパラダイムとして位置づけられた。

論文の概要: R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?

関連論文リスト