Fugu-MT 論文翻訳(概要): Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

論文の概要: Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

arxiv url: http://arxiv.org/abs/2509.26626v1
Date: Tue, 30 Sep 2025 17:58:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.242746
Title: Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Title（参考訳）: 帰納的自己集約が大規模言語モデルにおける深い思考を解き放つ
Authors: Siddarth Venkatraman, Vineet Jain, Sarthak Mittal, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian R. Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain,
Abstract要約: 推論時間計算は、複数の独立解の中から選択するか、あるいは自己精製を通じて逐次的にスケールすることができる。進化的手法にインスパイアされたテスト時間スケーリング手法であるRecursive Self-Aggregation (RSA)を提案する。
参考スコア（独自算出の注目度）: 85.76129014170778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.
Abstract（参考訳）: テストタイムスケーリング手法は、推論時に使用される計算量を増やして予測を行うことで、大規模言語モデル(LLM)の能力を向上させる。推論時間計算は、複数の独立解の中から選択するか、あるいは自己精製を通じて逐次的にスケールすることができる。並列スケーリングとシーケンシャルスケーリングの両方の利点を組み合わせた進化的手法に着想を得たテスト時間スケーリング手法であるRecursive Self-Aggregation (RSA)を提案する。 RSAの各ステップは、サブセットの集合を通じて候補推論鎖の集団を洗練させ、改良された解の集団を生成し、次のイテレーションの候補プールとして使用される。 RSAは、推論チェインに埋め込まれた豊富な情報(最終回答だけでなく)を活用し、異なる思考チェイン内の部分的に正しい中間ステップからのブートストラップを可能にします。実証的には、RSAは様々なタスク、モデルファミリー、サイズにわたる計算予算を増やすことで、大幅なパフォーマンス向上を実現している。特に、RSAはQwen3-4B-Instruct-2507で、DeepSeek-R1やo3-mini(ハイ)といった大きな推論モデルと競合する性能を実現し、AIME-25、HMMT-25、Reasoning Gym、LiveCodeBench-v6、SuperGPQAのスケーリング戦略は純粋に並列でシーケンシャルである。さらに,新たなアグリゲーション対応強化学習手法を用いて,ソリューションを組み合わせるためのモデルのトレーニングを行うことで,大幅な性能向上が期待できることを示す。コードはhttps://github.com/HyperPotatoNeo/RSA.comで公開されている。

論文の概要: Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

関連論文リスト