Fugu-MT 論文翻訳(概要): MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

論文の概要: MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

arxiv url: http://arxiv.org/abs/2604.24564v2
Date: Thu, 30 Apr 2026 01:34:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-01 14:06:12.646564
Title: MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG
Title（参考訳）: MEG-RAG:RAGにおけるエビデンス選択のためのマルチモーダルエビデンスグラウンドの定量化
Authors: Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao,
Abstract要約: MRAG(Multimodal Retrieval-Augmented Generation)は、MLLM(Multimodal Large Language Models)の重要な制限に対処する。得られた証拠の寄与を定量化する意味認識尺度であるMulti-modal Evidence Grounding (MEG)を提案する。 MEG-RAGはマルチモーダル・リランカを訓練し,得られた証拠を基底真実のセマンティックアンカーと整合させるフレームワークである。
参考スコア（独自算出の注目度）: 29.065833225528127
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. By prioritizing high-value content based on semantic grounding rather than token probability distributions, MEG-RAG improves the accuracy and multimodal consistency of generated outputs. Extensive experiments on the M$^2$RAG benchmark show that MEG-RAG consistently outperforms strong baselines and demonstrates robust generalization across different teacher models.
Abstract（参考訳）: MRAG(Multimodal Retrieval-Augmented Generation)は、幻覚や古い知識など、MLLM(Multimodal Large Language Models)の重要な制限に対処する。しかし、現在のMRAGシステムは、検索したマルチモーダルデータが回答のセマンティックコアを本当にサポートしているか、あるいは単に表面的関連性を提供するのかを区別するのに苦労している。既存のメトリクスは、しばしばヒューリスティックな位置ベースの信頼に頼り、マルチモーダルエンティティの情報密度を捉えるのに失敗する。そこで本研究では, 得られた証拠の寄与を定量化する意味認識尺度であるMulti-modal Evidence Grounding (MEG)を提案する。標準的な信頼度測定とは異なり、MEGはSemantic Certainty Anchoringを使用して、回答のセマンティックコアをよりよくキャプチャするハイIDF情報付加トークンにフォーカスしている。 MEG上に構築されたMEG-RAGは,検索した証拠を基底真実の意味的アンカーと整合させるために,マルチモーダル・リランカを訓練するフレームワークである。 MEG-RAGはトークン確率分布よりもセマンティックグラウンドに基づく高価値コンテンツを優先することにより、生成した出力の精度とマルチモーダル整合性を改善する。 M$^2$RAGベンチマークの大規模な実験により、MEG-RAGは強いベースラインを一貫して上回り、異なる教師モデル間で堅牢な一般化を示す。

論文の概要: MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

関連論文リスト