Fugu-MT 論文翻訳(概要): TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

論文の概要: TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

arxiv url: http://arxiv.org/abs/2603.17449v1
Date: Wed, 18 Mar 2026 07:45:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.572007
Title: TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL
Title（参考訳）: TRiMS: RLによる効率的な推論のための最小長実時間追跡
Authors: Tingcheng Bian, Jinchang Luo, Mingquan Cheng, Jinyu Zhang, Xiaoling Xia, Ni Li, Yan Tao, Haiwei Wang,
Abstract要約: 我々は,Token当たりのインテリジェンスを最大化するために,理論計量 MSL-Minimal Sufficient Length を導入する。 TRiMSは、すべてのベンチマークで小さな精度で80%以上のCoTトークンの削減を実現している。
参考スコア（独自算出の注目度）: 8.709290296692197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models achieve breakthroughs in complex reasoning via long chain-of-thought sequences. However, this often leads to severe reasoning inflation, causing substantial computational redundancy. To maximize Intelligence per Token, we introduce a theoretical metric, MSL-Minimal Sufficient Length. MSL rigorously characterizes the shortest reasoning length that preserves answer correctness. We provide a recursive definition based on independently sampled sequences and prove the existence of its limit, establishing the first measurable lower bound for reasoning-chain compression. Building on an analysis of mainstream CoT compression strategies, we identify key structural factors enabling a model to approach MSL. Based on these insights, we propose TRiMS which employs the GRPO algorithm in conjunction with MSL-based estimation during training, while mitigating instabilities during the training process through dynamic batch aggregation and advantage computation using batch-level standard deviation. TRiMS achieves over 80% CoT token reduction with a minor accuracy boost across all benchmarks.
Abstract（参考訳）: 大規模言語モデルは、長い連鎖列を通して複雑な推論においてブレークスルーを達成する。しかし、これはしばしば深刻な推論インフレーションを引き起こし、かなりの計算冗長性を引き起こす。 Token当たりのインテリジェンスを最大化するために、理論計量 MSL-Minimal Sufficient Length を導入する。 MSLは、答えの正しさを保存する最も短い推論長を厳格に特徴付けている。我々は、独立にサンプリングされたシーケンスに基づいて再帰的定義を提供し、その極限の存在を証明し、推論チェーン圧縮のための最初の測定可能な下限を確立する。主流のCoT圧縮戦略の解析に基づいて,モデルがMSLに近づくことを可能にする重要な構造因子を同定する。これらの知見に基づいて, GRPOアルゴリズムをトレーニング中のMSLに基づく推定と併用し, 動的バッチアグリゲーションとバッチレベルの標準偏差を用いた計算によるトレーニング過程における不安定性を緩和するTRiMSを提案する。 TRiMSは、すべてのベンチマークで小さな精度で80%以上のCoTトークンの削減を実現している。

論文の概要: TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL

関連論文リスト