Fugu-MT 論文翻訳(概要): RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

論文の概要: RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

arxiv url: http://arxiv.org/abs/2510.02263v1
Date: Thu, 02 Oct 2025 17:44:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:21.263174
Title: RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Title（参考訳）: RLAD: 推論問題を解決するための抽象化を見つけるためのLLMのトレーニング
Authors: Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar,
Abstract要約: 問題が発生したら、複数の抽象化を提案できるモデルをトレーニングし、続いてソリューション構築のインセンティブを与えるRLを作ります。この結果、RLトレーニングパラダイムはRLADと呼ばれ、抽象化ジェネレータとソリューションジェネレータを共同で訓練する。我々は、大規模なテスト予算で多くのソリューションを生成するよりも、より多くのテスト時間計算を抽象化の生成に割り当てることが、パフォーマンスに有益であることを示しています。
参考スコア（独自算出の注目度）: 98.98963933669751
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.
Abstract（参考訳）: 推論は、難しい問題に対する回答を導き出すのに使える"アルゴリズムの手続き"を識別し、実装するために、ソリューションのパターンマッチングや記憶以上のものを必要とします。そうするには、最も関連するプリミティブ、中間結果、あるいは共有プロシージャを実現し、それらに基づいて構築する必要があります。長い思考の連鎖に関するRLのポストトレーニングは、究極的にはこの種のアルゴリズムの振る舞いを明らかにすることを目的としているが、大きなモデルによって学習されたほとんどの推論トレースは、手順を一貫して捕捉または再利用せず、冗長で退化した探索へと流れていく。より効果的な推論に対処するために、我々は推論の抽象化を導入し、手続き的および事実的知識の自然言語記述を簡潔に記述し、モデルが推論を成功させるのを導く。問題を与えられた複数の抽象化を提案することができるモデルをトレーニングし、続いてこれらの抽象化が提供する情報を使用しながらソリューションを構築する動機付けを行うRLを学習する。この結果、RLトレーニングパラダイムはRLADと呼ばれ、抽象化ジェネレータとソリューションジェネレータを共同で訓練する。この設定により、構造化された探索を効果的に実現し、抽象提案と解生成の学習信号を分離し、より難しい問題への一般化を改善する。また、大規模なテスト予算で多くのソリューションを生成するよりも、より多くのテスト時間計算を抽象化に割り当てることの方が、パフォーマンスに有益であることを示し、意味のある探索を導く上での抽象化の役割を明らかにします。

論文の概要: RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

関連論文リスト