Fugu-MT 論文翻訳(概要): LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

論文の概要: LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

arxiv url: http://arxiv.org/abs/2512.05325v1
Date: Fri, 05 Dec 2025 00:04:42 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-13 22:40:56.84599
Title: LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning
Title（参考訳）: LYNX:信頼できる推論のための動的エクササイズ学習
Authors: Ömer Faruk Akgül, Yusuf Hakan Kalaycı, Rajgopal Kannan, Willie Neiswanger, Viktor Prasanna,
Abstract要約: LYNXはオンラインのアーリーエグジットメカニズムで、モデル自身の隠れ状態の認識を信頼性制御による停止決定に変換する。一般的な数学的コーパスで一度このプローブをトレーニングして校正し、ベンチマーク、復号化温度、さらには非数学的なタスクで再利用します。
参考スコア（独自算出の注目度）: 15.597220136913258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough information to answer correctly. This wastes inference-time compute and can hurt accuracy. Existing attempts to stop early either manipulate decoding with extra sampling and heuristics, rely on auxiliary verifier models, or operate only as post-hoc analysis pipelines without formal guarantees. We introduce LYNX, an online early-exit mechanism that turns a model's own hidden-state awareness into confidence-controlled stopping decisions. LYNX attaches exit decisions to naturally occurring reasoning cues (e.g., "hmm", "wait") during generation, trains a lightweight probe on hidden states at those cue tokens using supervision from forced exits, and wraps the resulting scores in split conformal prediction to obtain distribution-free control over premature exits. Crucially, we train and calibrate this probe once on a generic mathematical corpus and reuse it unchanged across benchmarks, decoding temperatures, and even non-mathematical tasks. Across three model families spanning 1.5B to 32B parameters, a single mathematically trained probe per base model yields strong accuracy--efficiency tradeoffs. On GSM8K, LYNX matches or improves baseline accuracy while reducing tokens by 40--65\%; on MATH-500 it improves accuracy by up to 12 points with roughly 35--60\% fewer tokens; on AIME 2024 it recovers baseline accuracy with more than 50\% token savings; and on CommonsenseQA, a non-math benchmark, it transfers zero-shot with modest accuracy gains and up to 70\% fewer tokens. Compared to state-of-the-art early-exit methods, LYNX offers competitive or superior Pareto frontiers while remaining fully online, requiring no proxy models at inference, and providing explicit, user-tunable confidence guarantees.
Abstract（参考訳）: 大規模な推論モデルは、思考の連鎖を拡大することによって複雑なタスクにおいて強力なパフォーマンスを達成するが、しばしば「過大評価」される。これは推論時間の計算を無駄にし、精度を損なう可能性がある。既存の試みでは、余分なサンプリングとヒューリスティックでデコードを操作するか、補助的な検証モデルに頼るか、公式な保証なしにポストホック分析パイプラインとしてのみ動作する。 LYNXはオンラインの早期退避機構で、モデル自身の隠れ状態の認識を信頼性に制御された停止決定に変換する。 LYNXは、生成中の自然発生推論キュー(例えば、"hmm", "wait")に出口決定をアタッチし、強制出口からの監督を用いてこれらのキュートークンの隠れ状態に対する軽量なプローブを訓練し、その結果のスコアを分割整列予測でラップし、早めの出口に対する分布制御を得る。重要なことは、このプローブを一般的な数学的コーパスでトレーニングして調整し、ベンチマークやデコード温度、さらには非数学的なタスクで再利用することです。 GSM8Kでは、LYNXはトークンを40-65\%削減しながらベースライン精度を向上し、MATH-500では、約35-60\%のトークンで最大12ポイントまで精度を向上し、AIME 2024では50-%以上のトークンセーブでベースライン精度を回復し、非数学ベンチマークであるCommonsenseQAでは、モデスト精度が向上し、最大70-%のトークンでゼロショットを転送する。最先端のアーリーエグジットメソッドと比較すると、LYNXは競合的ないし優れたParetoフロンティアを提供するが、完全にオンラインであり、推論時にプロキシモデルを必要としない。

論文の概要: LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning

関連論文リスト