Fugu-MT 論文翻訳(概要): Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

論文の概要: Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

arxiv url: http://arxiv.org/abs/2604.19780v1
Date: Sun, 29 Mar 2026 18:31:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.071756
Title: Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs
Title（参考訳）: 再考の回避と再考:LCMのためのカリキュラム対応予算計画
Authors: Amirul Rahman, Aisha Karim, Kenji Nakamura, Yi-Fan Ng,
Abstract要約: BCAE(Budget-Adaptive Curriculum Reasoning)は、推論品質とトークン効率を共同で最適化する統合フレームワークである。 BCAEには、Emphbudget-conditioned unified policy、Emphcurriculum-aware budget scheduler、Emphtruncation-aware dense reward mechanismが含まれる。
参考スコア（独自算出の注目度）: 1.7499351967216341
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling test-time compute via extended reasoning has become a key paradigm for improving the capabilities of large language models (LLMs). However, existing approaches optimize reasoning under fixed or uniformly sampled token budgets, ignoring the fundamental mismatch between problem difficulty and allocated compute. This leads to overthinking on easy problems and underthinking on hard ones, resulting in suboptimal token efficiency across diverse reasoning scenarios. In this paper, we propose Budget-Adaptive Curriculum Reasoning (BCAE), a unified framework that jointly optimizes reasoning quality and token efficiency through three synergistic components: (1) a \emph{budget-conditioned unified policy} that embeds the token budget as a continuous conditioning signal, eliminating the need for decoupled thinking and summarization strategies; (2) a \emph{curriculum-aware budget scheduler} that adaptively shifts the training budget distribution from easy to hard problems based on real-time learning progress; and (3) a \emph{truncation-aware dense reward} mechanism that provides fine-grained credit assignment at intermediate reasoning steps via process-level verification. We further introduce \emph{Budget-Conditioned Advantage Estimation} (BCAE), a novel variance reduction technique that conditions the advantage baseline on the sampled budget, yielding more stable policy gradients. Experiments on mathematical reasoning benchmarks (MATH, GSM8K, AIME, and Minerva Math) demonstrate that BACR consistently outperforms other strong baselines across all token budgets, achieving up to 8.3\% accuracy improvement under tight budgets while reducing average token consumption by 34\% compared to unconstrained reasoning.
Abstract（参考訳）: 拡張推論によるテスト時間計算のスケーリングは、大規模言語モデル(LLM)の能力を改善するための重要なパラダイムとなっている。しかし、既存のアプローチでは、固定されたまたは一様にサンプリングされたトークンの予算の下での推論を最適化し、問題の難しさと割り当てられた計算との根本的なミスマッチを無視している。これにより、簡単な問題を過度に考え、難しい問題を過度に考え、さまざまな推論シナリオにまたがって最適なトークン効率をもたらす。本稿では,(1) トークン予算を連続的条件付け信号として組み込み,デカップリングされた思考と要約戦略の必要性を排除した,(2) トレーニング予算の分散を,リアルタイム学習の進歩に基づく難易度から難易度に適応的にシフトさせる,(3) プロセス検証による推論段階における微粒なクレジット代入を提供する,3つの相乗的コンポーネントを通じて,推論品質とトークン効率を共同で最適化する統合フレームワークである,予算適応型カリキュラム推論(BCAE)を提案する。さらに、サンプル予算に有利な基準条件を定め、より安定した政策勾配をもたらす新しい分散還元手法である \emph{Budget-Conditioned Advantage Estimation} (BCAE) を導入する。数学的推論ベンチマーク(MATH、GSM8K、AIME、Minerva Math)の実験では、BACRは全てのトークン予算で一貫して他の強力なベースラインよりも優れており、厳密な予算下では最大8.3倍の精度向上を実現し、制約のない推論に比べて平均トークン消費を34倍削減している。

論文の概要: Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

関連論文リスト