Fugu-MT 論文翻訳(概要): e1: Learning Adaptive Control of Reasoning Effort

論文の概要: e1: Learning Adaptive Control of Reasoning Effort

arxiv url: http://arxiv.org/abs/2510.27042v1
Date: Thu, 30 Oct 2025 23:12:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:15.929768
Title: e1: Learning Adaptive Control of Reasoning Effort
Title（参考訳）: e1:Reasoning Effortの適応制御学習
Authors: Michael Kleinman, Matthew Trager, Alessandro Achille, Wei Xia, Stefano Soatto,
Abstract要約: AIモデルの思考予算の増大は、精度を大幅に向上させるが、すべての質問が同じ量の推論を保証しているわけではない。ユーザは、アウトプットの品質を、レイテンシやコストに対してどのように評価するかによって、さまざまな理由付けの労力を割り当てる傾向があります。本稿では,ユーザが指定したトークン数を用いてモデルを学習する自己適応型強化学習手法であるAdaptive Effort Controlを提案する。
参考スコア（独自算出の注目度）: 88.51897900019485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Increasing the thinking budget of AI models can significantly improve accuracy, but not all questions warrant the same amount of reasoning. Users may prefer to allocate different amounts of reasoning effort depending on how they value output quality versus latency and cost. To leverage this tradeoff effectively, users need fine-grained control over the amount of thinking used for a particular query, but few approaches enable such control. Existing methods require users to specify the absolute number of desired tokens, but this requires knowing the difficulty of the problem beforehand to appropriately set the token budget for a query. To address these issues, we propose Adaptive Effort Control, a self-adaptive reinforcement learning method that trains models to use a user-specified fraction of tokens relative to the current average chain-of-thought length for each query. This approach eliminates dataset- and phase-specific tuning while producing better cost-accuracy tradeoff curves compared to standard methods. Users can dynamically adjust the cost-accuracy trade-off through a continuous effort parameter specified at inference time. We observe that the model automatically learns to allocate resources proportionally to the task difficulty and, across model scales ranging from 1.5B to 32B parameters, our approach enables approximately 3x reduction in chain-of-thought length while maintaining or improving performance relative to the base model used for RL training.
Abstract（参考訳）: AIモデルの思考予算の増大は、精度を大幅に向上させるが、すべての質問が同じ量の推論を保証しているわけではない。ユーザは、アウトプットの品質を、レイテンシやコストに対してどのように評価するかによって、さまざまな理由付けの労力を割り当てる傾向があります。このトレードオフを効果的に活用するには、ユーザーは特定のクエリで使用される思考量を細かく制御する必要があるが、そのような制御を可能にするアプローチは少ない。既存のメソッドでは、ユーザーは希望するトークンの絶対数を指定する必要があるが、クエリのトークン予算を適切に設定するためには、事前に問題の難しさを知る必要がある。これらの問題に対処するために,各クエリの平均チェーン長に対して,ユーザが指定したトークンの分数を使用するようにモデルを訓練する自己適応型強化学習手法であるAdaptive Effort Controlを提案する。このアプローチでは、データセットやフェーズ固有のチューニングを排除し、標準手法に比べてコスト-精度のトレードオフ曲線が向上する。ユーザは推論時に指定された継続的な作業パラメータを通じて、コスト-正確性のトレードオフを動的に調整できる。提案手法は,タスクの難易度に比例して資源を割り当てることを自動的に学習し,1.5Bから32Bパラメータのモデルスケールにわたって,RLトレーニングに使用するベースモデルに対する性能を維持・改善しながら,チェーン・オブ・シント長の約3倍の削減を可能にする。

論文の概要: e1: Learning Adaptive Control of Reasoning Effort

関連論文リスト