Fugu-MT 論文翻訳(概要): ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

論文の概要: ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

arxiv url: http://arxiv.org/abs/2601.04973v1
Date: Thu, 08 Jan 2026 14:22:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:53.231138
Title: ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning
Title（参考訳）: ConMax: 効率的な連鎖推論のための信頼性を最大化する圧縮
Authors: Minda Hu, Zexuan Qiu, Zenan Xu, Kun Li, Bo Zhou, Irwin King,
Abstract要約: 大規模推論モデルは、精度を向上することなく計算コストを増大させる冗長な推論経路を生成する。本稿では,推論トレースを自動的に圧縮する新しい強化学習フレームワークであるConMaxを紹介する。 5つの推論データセットに対する実験は、ConMaxが優れた効率とパフォーマンスのトレードオフを達成することを示した。
参考スコア（独自算出の注目度）: 46.481679150652205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent breakthroughs in Large Reasoning Models (LRMs) have demonstrated that extensive Chain-of-Thought (CoT) generation is critical for enabling intricate cognitive behaviors, such as self-verification and backtracking, to solve complex tasks. However, this capability often leads to ``overthinking'', where models generate redundant reasoning paths that inflate computational costs without improving accuracy. While Supervised Fine-Tuning (SFT) on reasoning traces is a standard paradigm for the 'cold start' phase, applying existing compression techniques to these traces often compromises logical coherence or incurs prohibitive sampling costs. In this paper, we introduce ConMax (Confidence-Maximizing Compression), a novel reinforcement learning framework designed to automatically compress reasoning traces while preserving essential reasoning patterns. ConMax formulates compression as a reward-driven optimization problem, training a policy to prune redundancy by maximizing a weighted combination of answer confidence for predictive fidelity and thinking confidence for reasoning validity through a frozen auxiliary LRM. Extensive experiments across five reasoning datasets demonstrate that ConMax achieves a superior efficiency-performance trade-off. Specifically, it reduces inference length by 43% over strong baselines at the cost of a mere 0.7% dip in accuracy, proving its effectiveness in generating high-quality, efficient training data for LRMs.
Abstract（参考訳）: 近年のLRM(Large Reasoning Models)のブレークスルーは、CoT(Chain-of-Thought)生成が複雑なタスクを解決するために、自己検証やバックトラックのような複雑な認知行動を可能にするために重要であることを示した。しかし、この能力はしばしば 'overthinking' につながり、モデルが計算コストを精度を向上することなく増大させる冗長な推論経路を生成する。推論トレースに関するスーパービジョンファインチューニング(SFT)は、'コールドスタート'フェーズの標準パラダイムであるが、これらのトレースに既存の圧縮技術を適用すると、論理的一貫性が損なわれるか、違法なサンプリングコストが発生する。本稿では,重要な推論パターンを保ちながら,推論トレースを自動的に圧縮する新しい強化学習フレームワークであるConMax(Confidence-Maximizing Compression)を紹介する。 ConMaxは、圧縮を報酬駆動最適化問題として定式化し、解答信頼度と解答信頼度との重み付けを最大化し、解答信頼度と解答信頼度を、凍結補助LDMによる推論妥当性を最大化することにより、冗長性を高めるためのポリシーを訓練する。 5つの推論データセットにわたる大規模な実験は、ConMaxが優れた効率とパフォーマンスのトレードオフを達成することを示す。具体的には、わずか0.7%の精度で強いベースラインよりも推論の長さを43%削減し、LRMの高品質で効率的なトレーニングデータを生成する効果を証明した。

論文の概要: ConMax: Confidence-Maximizing Compression for Efficient Chain-of-Thought Reasoning

関連論文リスト