Fugu-MT 論文翻訳(概要): Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

論文の概要: Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

arxiv url: http://arxiv.org/abs/2603.10335v1
Date: Wed, 11 Mar 2026 02:11:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.748869
Title: Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models
Title（参考訳）: 燃料ゲージ:大型マルチモーダルモデルにおける待ち時間長推定
Authors: Yuedong Yang, Xiwen Wei, Mustafa Munir, Radu Marculescu,
Abstract要約: LMM(Large Multi-modality Models)は、多くのアプリケーションにおいて事実上の選択肢となっている。これらのモデルは、実行時に長く予測不可能なChain-of-Thought(CoT)プロセスに依存している。本稿では,この隠れ信号を抽出し,CoT長を事前に予測する最初の手法であるFuel Gaugeを提案する。
参考スコア（独自算出の注目度）: 31.015906980192543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications. However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking). We observe empirically that the CoT process follows a very simple form, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of "fuel" available to support the reasoning process. Based on this insight, we propose Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking. Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37x reduction in the memory allocation frequency.
Abstract（参考訳）: LMM(Large Multi-modality Models)は、多くのアプリケーションにおいて事実上の選択肢となっている。しかしながら、これらのモデルは、実行時に長大で予測不可能なChain-of-Thought(CoT)プロセスに依存しており、しばしば計算資源(メモリの断片化による)と準最適精度(アンダー・アンド・オーバー・思考による)の非効率な使用をもたらす。我々は、CoTプロセスが、特定の生成されたサンプルから独立して振舞う非常に単純な形式に従うことを経験的に観察する。これは、推理プロセスをサポートするのに利用可能な「燃料」の量を表す隠れパラメータに基づいて、CoTの長さを事前に推定できることを示唆している。この知見に基づいて,この隠れ信号を抽出し,CoT長を事前に予測する最初の手法であるFuel Gaugeを提案する。本稿では,LMMサービスシステムにおけるメモリ断片化に対処する予測的KVキャッシュ割り当てと,過度な検討と過度な考察を緩和するCoT長変調という2つのダウンストリームタスクにおいて,Fuel Gaugeの実用性を実証する。テキストのみ, 画像テキスト, ビデオテキスト質問応答ベンチマークを対象とするLMMの大規模実験により, 燃料ゲージの有効性, 一般化性, 実用性を示した。例えば、GPQA-Diamondベンチマークでは、Felel Gaugeはベースラインに比べてCoT長予測誤差が半減するが、これはメモリ割り当て頻度を13.37倍に削減する。

論文の概要: Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

関連論文リスト