Fugu-MT 論文翻訳(概要): Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

論文の概要: Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

arxiv url: http://arxiv.org/abs/2604.04120v1
Date: Sun, 05 Apr 2026 13:43:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.930315
Title: Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression
Title（参考訳）: 短いが、まだ信頼できるか? : チェーン・オブ・ソート圧縮の実証的研究
Authors: Lingjie Zeng, Xiaofan Chen, Yanbo Wang, Xiuying Chen,
Abstract要約: ロングチェーン・オブ・ソート推論モデル(Long-CoT)は、推論コストを減らすために推論トレースを圧縮する取り組みの活発化を動機付けている。我々は,CoT圧縮がモデル信頼性に与える影響について,最初の系統的研究を行った。 CoT圧縮は信頼度レグレッションを頻繁に導入し、異なる手法が寸法によって著しく異なる劣化プロファイルを示すことがわかった。
参考スコア（独自算出の注目度）: 19.669117846064562
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preserving accuracy does not, a priori, guarantee preserving trustworthiness. We conduct the first systematic empirical study of how CoT compression affects model trustworthiness, evaluating multiple models of different scales along three dimensions: safety, hallucination resistance, and multilingual robustness. Under controlled comparisons, we find that CoT compression frequently introduces trustworthiness regressions and that different methods exhibit markedly different degradation profiles across dimensions. To enable fair comparison across bases, we propose a normalized efficiency score for each dimension that reveals how naïve scalar metrics can obscure trustworthiness trade-offs. As an existence proof, we further introduce an alignment-aware DPO variant that reduces CoT length by 19.3\% on reasoning benchmarks with substantially smaller trustworthiness loss. Our findings suggest that CoT compression should be optimized not only for efficiency but also for trustworthiness, treating both as equally important design constraints.
Abstract（参考訳）: ロングチェーン・オブ・思想推論モデル(Long-CoT)は推論コストを削減するために推論トレースを圧縮する取り組みの活発化を動機としているが、既存の評価はタスクの正確性やトークンの貯蓄にのみ焦点をあてている。ポストトレーニングによって取得または強化された信頼度特性は、圧縮が修正するのと同じパラメータ空間に符号化される。これは、正確性を維持することは、最優先事項であり、信頼性を保つことを保証しないことを意味する。我々は,CoT圧縮がモデルの信頼性にどのように影響するかを,安全,幻覚抵抗,多言語的堅牢性という3次元の異なるスケールの複数のモデルを評価し,最初の系統的研究を行った。制御された比較の結果,CoT圧縮は信頼度レグレッションを頻繁に導入し,異なる手法が寸法によって著しく異なる劣化プロファイルを示すことがわかった。ベース間の公正な比較を可能にするため,各次元の正規化効率スコアを提案し,スカラー指標が信頼性のトレードオフを曖昧にする方法を明らかにした。既存の証明として,信頼性の低下が著しく小さい推論ベンチマークにおいて,CoTの長さを19.3\%削減するアライメント対応のDPO変異を導入する。この結果から,CoT圧縮は効率だけでなく信頼性にも最適化されるべきであることが示唆された。

論文の概要: Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

関連論文リスト