Fugu-MT 論文翻訳(概要): Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning

論文の概要: Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning

arxiv url: http://arxiv.org/abs/2604.09150v1
Date: Fri, 10 Apr 2026 09:31:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-13 17:57:53.799051
Title: Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning
Title（参考訳）: 知っておくべきこと:効率的な推論のための知識誘導による状態認識推論圧縮
Authors: Yi Sui, Chaozhuo Li, Dawei Song,
Abstract要約: ロングチェーン・オブ・ソート(CoT)を利用したLRM(Large Reasoning Models)による複雑なタスクの性能向上既存のCoT圧縮法は、精度と効率のバランスに苦慮し、冗長性と推論バイアスに対するきめ細かいステップレベルの適応を欠いている。我々は,段階的にCoT圧縮を行うフレームワークである知識誘導による状態認識推論圧縮(Experiments)を提案する。
参考スコア（独自算出の注目度）: 16.46227355517168
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Reasoning Models (LRMs) achieve strong performance on complex tasks by leveraging long Chain-of-Thought (CoT), but often suffer from overthinking, leading to excessive reasoning steps and high inference latency. Existing CoT compression methods struggle to balance accuracy and efficiency, and lack fine-grained, step-level adaptation to redundancy and reasoning bias. Therefore, we propose State-Aware Reasoning Compression with Knowledge Guidance (STACK), a framework that performs step-wise CoT compression by explicitly modeling stage-specific redundancy sources and integrating with a retrieval-augmented guidance. STACK constructs online long-short contrastive samples and dynamically switches between knowledge-guided compression for uncertain or biased reasoning state and self-prompted compression for overly long but confident state, complemented by an answer-convergence-based early stopping mechanism to suppress redundant verification. We further propose a reward-difference-driven training strategy by combining Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO), enabling models to learn state-conditioned compression strategies. Experiments on three mathematical reasoning benchmarks show that STACK achieves a superior accuracy-efficiency balance, reducing average response length by 59.9% while improving accuracy by 4.8 points over existing methods.
Abstract（参考訳）: 大きな推論モデル(LRM)は、長いチェーン・オブ・ソート(CoT)を活用することで複雑なタスクにおいて強力なパフォーマンスを達成するが、しばしば過度な推論ステップと高い推論レイテンシに悩まされる。既存のCoT圧縮法は、精度と効率のバランスに苦慮し、冗長性と推論バイアスに対するきめ細かいステップレベルの適応を欠いている。そこで我々は,段階固有の冗長性ソースを明示的にモデル化し,検索強化ガイダンスを統合することで,段階的にCoT圧縮を行うフレームワークであるState-Aware Reasoning Compression with Knowledge Guidance (STACK)を提案する。 STACKは、オンラインのロングショートコントラストサンプルを構築し、不確実または偏りのある推論状態に対する知識誘導圧縮と、過度に長いが自信のある状態に対する自己プロンプト圧縮とを動的に切り替え、冗長な検証を抑制するための応答収束に基づく早期停止機構を補完する。さらに,PPO(Pximal Policy Optimization)とDPO(Direct Preference Optimization)を組み合わせることで,モデルが状態条件付き圧縮戦略を学習できるようにする。 3つの数学的推論ベンチマークの実験では、STACKは精度と効率のバランスが良く、平均応答長は59.9%減少し、既存の手法よりも4.8ポイント向上している。

論文の概要: Think Less, Know More: State-Aware Reasoning Compression with Knowledge Guidance for Efficient Reasoning

関連論文リスト