Fugu-MT 論文翻訳(概要): RAST: Reasoning Activation in LLMs via Small-model Transfer

論文の概要: RAST: Reasoning Activation in LLMs via Small-model Transfer

arxiv url: http://arxiv.org/abs/2506.15710v1
Date: Fri, 30 May 2025 17:57:08 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-29 09:28:14.778019
Title: RAST: Reasoning Activation in LLMs via Small-model Transfer
Title（参考訳）: RAST:小型モデル転送によるLDMの活性化
Authors: Siru Ouyang, Xinyu Zhu, Zilin Xiao, Minhao Jiang, Yu Meng, Jiawei Han,
Abstract要約: 強化学習(RL)は,大規模言語モデル(LLM)の推論能力向上のための強力なアプローチとなっている。大規模にRLを適用することは、潜在的にリソース集約であり、複数のモデルコピーと広範なGPUワークロードを必要とします。本稿では、RL学習モデルからRL学習モデルからより大規模なモデルにRL誘導確率調整を注入することにより、推論挙動を伝達する簡易かつ効果的なRASTを提案する。
参考スコア（独自算出の注目度）: 33.32587030836428
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has become a powerful approach for improving the reasoning capabilities of large language models (LLMs), as evidenced by recent successes such as OpenAI's o1 and Deepseek-R1. However, applying RL at scale remains intimidatingly resource-intensive, requiring multiple model copies and extensive GPU workloads. On the other hand, while being powerful, recent studies suggest that RL does not fundamentally endow models with new knowledge; rather, it primarily reshapes the model's output distribution to activate reasoning capabilities latent in the base model. Building on this insight, we hypothesize that the changes in output probabilities induced by RL are largely model-size invariant, opening the door to a more efficient paradigm: training a small model with RL and transferring its induced probability shifts to larger base models. To verify our hypothesis, we conduct a token-level analysis of decoding trajectories and find high alignment in RL-induced output distributions across model scales, validating our hypothesis. Motivated by this, we propose RAST, a simple yet effective method that transfers reasoning behaviors by injecting RL-induced probability adjustments from a small RL-trained model into larger models. Experiments across multiple mathematical reasoning benchmarks show that RAST substantially and consistently enhances the reasoning capabilities of base models while requiring significantly lower GPU memory than direct RL training, sometimes even yielding better performance than the RL-trained counterparts. Our findings offer new insights into the nature of RL-driven reasoning and practical strategies for scaling its benefits without incurring its full computational cost. The project page of RAST is available at https://ozyyshr.github.io/RAST/.
Abstract（参考訳）: 強化学習(RL)は、OpenAIのo1やDeepseek-R1といった最近の成功によって証明されているように、大規模言語モデル(LLM)の推論能力を改善するための強力なアプローチとなっている。しかしながら、スケールでのRLの適用は、複数のモデルコピーと広範なGPUワークロードを必要とするため、非常にリソース集約的なままである。一方、RLは強力である一方、最近の研究では、RLはモデルに新しい知識を与えるのではなく、主にモデルの出力分布を再利用し、ベースモデルに潜む推論能力の活性化を図っている。この知見に基づいて、RLによって誘導される出力確率の変化は、大半がモデルサイズの不変量であり、より効率的なパラダイムへの扉を開く。仮説を検証するために, 軌道の復号化のトークンレベル解析を行い, モデルスケールにまたがるRL誘起出力分布に高いアライメントを見いだし, 仮説を検証した。そこで本研究では,RL学習モデルからより大規模なモデルにRL誘導確率調整を注入することにより,推論行動の簡易かつ効果的な変換法であるRASTを提案する。複数の数学的推論ベンチマークによる実験では、RASTはベースモデルの推論能力を大幅に向上する一方で、直接RLトレーニングよりもGPUメモリを著しく低くし、時にはRLトレーニングされたモデルよりもパフォーマンスも向上することが示された。本研究は, RL による推論の性質に関する新たな知見を提供するとともに, 計算コストの削減を伴わずにそのメリットをスケールするための実践的戦略を提供する。 RASTのプロジェクトページはhttps://ozyyshr.github.io/RAST/で公開されている。

論文の概要: RAST: Reasoning Activation in LLMs via Small-model Transfer

関連論文リスト