Fugu-MT 論文翻訳(概要): Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

論文の概要: Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

arxiv url: http://arxiv.org/abs/2603.21426v1
Date: Sun, 22 Mar 2026 22:33:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.411851
Title: Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models
Title（参考訳）: マルチモーダル大言語モデルの不確実性を考慮した知識蒸留
Authors: Jingchen Sun, Shaobo Han, Deep Patel, Wataru Kohno, Can Jin, Changyou Chen,
Abstract要約: 知識蒸留は、データ監督と教師指導の両方を活用する学習パラダイムを確立する。本研究では,教師の指導にどの程度依存しているかを規定する不確実性を考慮した蒸留フレームワークであるBeta-KDを提案する。
参考スコア（独自算出の注目度）: 26.06143154557816
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge distillation establishes a learning paradigm that leverages both data supervision and teacher guidance. However, determining the optimal balance between learning from data and learning from the teacher is challenging, as some samples may be noisy while others are subject to teacher uncertainty. This motivates the need for adaptively balancing data and teacher supervision. We propose Beta-weighted Knowledge Distillation (Beta-KD), an uncertainty-aware distillation framework that adaptively modulates how much the student relies on teacher guidance. Specifically, we formulate teacher--student learning from a unified Bayesian perspective and interpret teacher supervision as a Gibbs prior over student activations. This yields a closed-form, uncertainty-aware weighting mechanism and supports arbitrary distillation objectives and their combinations. Extensive experiments on multimodal VQA benchmarks demonstrate that distilling student Vision-Language Models from a large teacher VLM consistently improves performance. The results show that Beta-KD outperforms existing knowledge distillation methods. The code is available at https://github.com/Jingchensun/beta-kd.
Abstract（参考訳）: 知識蒸留は、データ監督と教師指導の両方を活用する学習パラダイムを確立する。しかし、データからの学習と教師からの学習の最適なバランスを決定することは困難であり、一部のサンプルは騒々しいが、他のサンプルは教師の不確実性にさらされている。これは、データと教師の監督を適応的にバランスさせることの必要性を動機付けている。本研究では,教師の指導にどの程度依存しているかを適応的に調節する不確実性を考慮した蒸留フレームワークであるBeta-KDを提案する。具体的には、統一ベイズ的視点から教師の学習を定式化し、教師の指導を学生の活性化に先立ってギブズとして解釈する。これにより、クローズドな不確実性を考慮した重み付け機構が得られ、任意の蒸留目標とその組み合わせをサポートする。マルチモーダルVQAベンチマークの大規模な実験は、大規模な教師VLMから学生ビジョンランゲージモデルを蒸留することで、常に性能が向上することを示した。その結果,β-KDは既存の知識蒸留法よりも優れていた。コードはhttps://github.com/Jingchensun/beta-kd.comで公開されている。

論文の概要: Uncertainty-Aware Knowledge Distillation for Multimodal Large Language Models

関連論文リスト