Fugu-MT 論文翻訳(概要): Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs

論文の概要: Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs

arxiv url: http://arxiv.org/abs/2604.12035v1
Date: Mon, 13 Apr 2026 20:24:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.111775
Title: Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs
Title（参考訳）: 視覚的トーケンプルーニングは校正を改善するか? : MLLMの信頼性に関する実証的研究
Authors: Kaizhen Tan,
Abstract要約: 視覚的トークンプルーニングがモデルキャリブレーションにどのように影響するか,すなわち,信頼度が実際の正しさと一致しているかを検討する。以上の結果から,プルーニングは単に効率の面での信頼性を損なうものではないことが示唆された。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual token pruning is a widely used strategy for efficient inference in multimodal large language models (MLLMs), but existing work mainly evaluates it with task accuracy. In this paper, we study how visual token pruning affects model calibration, that is, whether predicted confidence matches actual correctness. Using LLaVA-1.5-7B on POPE and ScienceQA-IMG, we evaluate Expected Calibration Error (ECE), Brier score, and AURC under several pruning strategies, including SCOPE with different saliency weights, saliency-only pruning, FastV, and random pruning, across multiple token budgets. Our results show that pruning does not simply trade reliability for efficiency. On POPE, a pure-coverage setting in SCOPE achieves substantially lower ECE than the full unpruned model while maintaining similar accuracy. An internal alpha-sweep further shows a consistent trend: reducing the saliency weight improves calibration at all tested token budgets, while accuracy changes only slightly. In contrast, saliency-based pruning leads to worse calibration, and real FastV causes severe performance degradation in our setting. On ScienceQA-IMG, pruning also reduces ECE, with accuracy remaining stable or slightly improving. We additionally study the gap power exponent in coverage-based selection and find that its default setting is not always optimal. Overall, our results suggest that visual token pruning should be evaluated not only by accuracy, but also by confidence quality, especially for multimodal systems that need reliable decisions.
Abstract（参考訳）: 視覚トークンプルーニングはマルチモーダル大言語モデル(MLLM)における効率的な推論手法として広く用いられているが、既存の研究は主にタスク精度で評価されている。本稿では,視覚的トークンプルーニングがモデルキャリブレーションに与える影響,すなわち,予測された信頼度が実際の正しさに合致するかどうかについて検討する。 POPEとScienceQA-IMGのLLaVA-1.5-7Bを用いて,複数のトークン予算にまたがって,サリエンシのみのSCOPE,サリエンシのみのプルーニング,FastV,ランダムプルーニングなど,いくつかのプルーニング戦略の下でキャリブレーションエラー(ECE),ブライアスコア,AURCを評価した。以上の結果から,プルーニングは単に効率の面での信頼性を損なうものではないことが示唆された。 POPEでは、SCOPEの純被覆設定は、同じ精度を維持しながら、完全な未切断モデルよりもかなり低いECEを達成する。内部のアルファスイープはさらに一貫した傾向を示しており、サリエンシ重量を減らすことで、テストされた全てのトークン予算におけるキャリブレーションが向上する一方、精度はわずかに変化している。対照的に、サリエンシベースのプルーニングはキャリブレーションを悪化させ、実際のFastVは我々の設定で深刻な性能劣化を引き起こす。 ScienceQA-IMGでは、プルーニングはECEを低減し、精度は安定かわずかに改善されている。さらに、カバレッジベース選択におけるギャップパワー指数について検討し、デフォルト設定が常に最適であるとは限らないことを確認する。以上の結果から,視覚的トークンプルーニングは精度だけでなく,信頼性の高い品質,特に信頼性の高い意思決定を必要とするマルチモーダルシステムにおいても評価されるべきであることが示唆された。

論文の概要: Does Visual Token Pruning Improve Calibration? An Empirical Study on Confidence in MLLMs

関連論文リスト