Fugu-MT 論文翻訳(概要): DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

論文の概要: DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

arxiv url: http://arxiv.org/abs/2508.17803v1
Date: Mon, 25 Aug 2025 08:47:36 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-26 18:43:45.698084
Title: DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
Title（参考訳）: DRQA: 大規模言語モデルの推論における再考制御のための動的推論Quotaアロケーション
Authors: Kaiwen Yan, Xuanqing Shi, Hongcheng Guo, Wenxuan Wang, Zhuosheng Zhang, Chengwei Qin,
Abstract要約: RLLM(Reasoning large language model)は、最近、構造化および多段階推論を実行することで、顕著な機能を示した。バッチ処理から単一問合せ推論へのリソース競合の利点を伝達する新しい手法であるDRQA(Dynamic Reasoning Quota Allocation)を提案する。
参考スコア（独自算出の注目度）: 28.90035967715762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning large language models (RLLMs), such as OpenAI-O3 and DeepSeek-R1, have recently demonstrated remarkable capabilities by performing structured and multi-step reasoning. However, recent studies reveal that RLLMs often suffer from overthinking, i.e., producing unnecessarily lengthy reasoning chains even for simple questions, leading to excessive token consumption and computational inefficiency. Interestingly, we observe that when processing multiple questions in batch mode, RLLMs exhibit more resource-efficient behavior by dynamically compressing reasoning steps for easier problems, due to implicit resource competition. Inspired by this, we propose Dynamic Reasoning Quota Allocation (DRQA), a novel method that transfers the benefits of resource competition from batch processing to single-question inference. Specifically, DRQA leverages batch-generated preference data and reinforcement learning to train the model to allocate reasoning resources adaptively. By encouraging the model to internalize a preference for responses that are both accurate and concise, DRQA enables it to generate concise answers for simple questions while retaining sufficient reasoning depth for more challenging ones. Extensive experiments on a wide range of mathematical and scientific reasoning benchmarks demonstrate that DRQA significantly reduces token usage while maintaining, and in many cases improving, answer accuracy. By effectively mitigating the overthinking problem, DRQA offers a promising direction for more efficient and scalable deployment of RLLMs, and we hope it inspires further exploration into fine-grained control of reasoning behaviors.
Abstract（参考訳）: OpenAI-O3やDeepSeek-R1のような大きな言語モデル(RLLM)の推論は、最近、構造化および多段階の推論を実行することで顕著な機能を示した。しかし、最近の研究では、RLLMは単純な問題であっても必要以上に長い推論連鎖を発生させ、過剰なトークン消費と計算の非効率をもたらすという過度な考えに悩まされていることが判明している。興味深いことに、バッチモードで複数の質問を処理する場合、RLLMは暗黙のリソース競合のため、より簡単な問題に対する推論ステップを動的に圧縮することで、よりリソース効率のよい振る舞いを示す。そこで本研究では、バッチ処理から単一クエリ推論へのリソース競合の利点を伝達する新しい手法であるDRQA(Dynamic Reasoning Quota Allocation)を提案する。具体的には、DRQAはバッチ生成された嗜好データと強化学習を利用して、推論リソースを適応的に割り当てるようにモデルを訓練する。 DRQAは、正確かつ簡潔な応答の選好をモデルに内在させることによって、簡単な質問に対して簡潔な回答を生成できると同時に、より困難な質問に対して十分な推論深度を保持することができる。幅広い数学的および科学的推論ベンチマークに関する広範囲な実験により、DRQAは維持しながらトークンの使用を著しく減らし、多くの場合、精度を向上する。再考問題を効果的に緩和することにより、DRQAはより効率的でスケーラブルなRLLMの展開に向けて有望な方向を提供する。

論文の概要: DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models

関連論文リスト