Fugu-MT 論文翻訳(概要): Generative Score Inference for Multimodal Data

論文の概要: Generative Score Inference for Multimodal Data

arxiv url: http://arxiv.org/abs/2603.26349v1
Date: Fri, 27 Mar 2026 12:24:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.495394
Title: Generative Score Inference for Multimodal Data
Title（参考訳）: マルチモーダルデータの生成スコア推論
Authors: Xinyu Tian, Xiaotong Shen,
Abstract要約: 本稿では,統計的に有効な情報的予測・信頼セットの構築が可能なフレキシブル推論フレームワークであるジェネレーティブスコア推論を紹介する。我々は,大言語モデルにおける幻覚検出と画像キャプションにおける不確実性推定という2つの代表的なシナリオを通じて,GSIの能力を実証的に検証した。本手法は,画像キャプションにおける幻覚検出における最先端性能と頑健な予測不確実性を実現し,その性能は基礎となる生成モデルの品質に肯定的な影響を受けている。
参考スコア（独自算出の注目度）: 11.857867207010981
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome these limitations, we introduce Generative Score Inference (GSI), a flexible inference framework capable of constructing statistically valid and informative prediction and confidence sets across a wide range of multimodal learning problems. GSI utilizes synthetic samples generated by deep generative models to approximate conditional score distributions, facilitating precise uncertainty quantification without imposing restrictive assumptions about the data or tasks. We empirically validate GSI's capabilities through two representative scenarios: hallucination detection in large language models and uncertainty estimation in image captioning. Our method achieves state-of-the-art performance in hallucination detection and robust predictive uncertainty in image captioning, and its performance is positively influenced by the quality of the underlying generative model. These findings underscore the potential of GSI as a versatile inference framework, significantly enhancing uncertainty quantification and trustworthiness in multimodal learning.
Abstract（参考訳）: 正確な不確実性定量化は、教師付き学習シナリオ、特に画像やテキストなどの複雑なマルチモーダルデータを扱う場合において、信頼性の高い決定を行う上で重要である。現在のアプローチは、厳密な仮定や限定的な一般化可能性など、様々な教師付き学習タスクで有効性を制限している、顕著な制限に直面していることが多い。これらの制約を克服するために,多モード学習問題にまたがって統計的に妥当かつ情報的予測・信頼セットを構築することのできるフレキシブルな推論フレームワークであるジェネレーティブスコア推論(GSI)を導入する。 GSIは、深層生成モデルによって生成された合成サンプルを使用して、条件付きスコア分布を近似し、データやタスクに関する制限的な仮定を課すことなく、正確な不確実性定量化を容易にする。我々は,大言語モデルにおける幻覚検出と画像キャプションにおける不確実性推定という2つの代表的なシナリオを通じて,GSIの能力を実証的に検証した。本手法は,画像キャプションにおける幻覚検出における最先端性能と頑健な予測不確実性を実現し,その性能は基礎となる生成モデルの品質に肯定的な影響を受けている。これらの結果は,多モーダル学習における不確実性定量化と信頼性を著しく向上させ,汎用推論フレームワークとしてのGSIの可能性を強調した。

論文の概要: Generative Score Inference for Multimodal Data

関連論文リスト