Fugu-MT 論文翻訳(概要): DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

論文の概要: DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

arxiv url: http://arxiv.org/abs/2508.15876v1
Date: Thu, 21 Aug 2025 11:24:26 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-25 16:42:36.140229
Title: DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking
Title（参考訳）: DeepMEL: マルチモーダルエンティティリンクのためのマルチエージェントコラボレーションフレームワーク
Authors: Fang Wang, Tianwei Yan, Zonghao Yang, Minghao Hu, Jun Zhang, Zhunchen Luo, Xiaoying Bai,
Abstract要約: Multimodal Entity Linkingは、テキストとビジュアルの言及と、マルチモーダルな知識グラフのエンティティを関連付けることを目的としている。現状の手法では、不完全文脈情報、粗いクロスモーダル融合、共同言語モデル(LLM)と大規模視覚モデル(LVM)の難しさといった課題に直面している。マルチエージェント協調推論に基づく新しいフレームワークであるDeepMELを提案する。
参考スコア（独自算出の注目度）: 18.8210909297317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Entity Linking (MEL) aims to associate textual and visual mentions with entities in a multimodal knowledge graph. Despite its importance, current methods face challenges such as incomplete contextual information, coarse cross-modal fusion, and the difficulty of jointly large language models (LLMs) and large visual models (LVMs). To address these issues, we propose DeepMEL, a novel framework based on multi-agent collaborative reasoning, which achieves efficient alignment and disambiguation of textual and visual modalities through a role-specialized division strategy. DeepMEL integrates four specialized agents, namely Modal-Fuser, Candidate-Adapter, Entity-Clozer and Role-Orchestrator, to complete end-to-end cross-modal linking through specialized roles and dynamic coordination. DeepMEL adopts a dual-modal alignment path, and combines the fine-grained text semantics generated by the LLM with the structured image representation extracted by the LVM, significantly narrowing the modal gap. We design an adaptive iteration strategy, combines tool-based retrieval and semantic reasoning capabilities to dynamically optimize the candidate set and balance recall and precision. DeepMEL also unifies MEL tasks into a structured cloze prompt to reduce parsing complexity and enhance semantic comprehension. Extensive experiments on five public benchmark datasets demonstrate that DeepMEL achieves state-of-the-art performance, improving ACC by 1%-57%. Ablation studies verify the effectiveness of all modules.
Abstract（参考訳）: MEL(Multimodal Entity Linking)は、マルチモーダルな知識グラフのエンティティとテキストとビジュアルな言及を関連付けることを目的としている。その重要性にもかかわらず、現在の手法では、不完全な文脈情報、粗いクロスモーダル融合、共同言語モデル(LLM)と大規模視覚モデル(LVM)の難しさといった課題に直面している。これらの課題に対処するために,多エージェント協調推論に基づく新しいフレームワークであるDeepMELを提案する。 DeepMELは、Modal-Fuser、Candidate-Adapter、Entity-Clozer、Role-Orchestratorという4つの特殊エージェントを統合し、特殊な役割と動的調整を通じてエンドツーエンドのクロスモーダルリンクを完成させる。 DeepMELは、デュアルモーダルアライメントパスを採用し、LLMによって生成された微細なテキストセマンティクスとLVMによって抽出された構造化画像表現を組み合わせ、モーダルギャップを著しく狭める。適応的な反復戦略を設計し、ツールベースの検索とセマンティック推論機能を組み合わせて、候補セットを動的に最適化し、リコールと精度のバランスをとる。 DeepMELはまた、MELタスクを構造化クローゼプロンプトに統合し、解析の複雑さを減らし、セマンティック理解を強化する。 5つの公開ベンチマークデータセットに対する大規模な実験は、DeepMELが最先端のパフォーマンスを達成し、ACCを1%から57%改善していることを示している。アブレーション研究はすべての加群の有効性を検証する。

論文の概要: DeepMEL: A Multi-Agent Collaboration Framework for Multimodal Entity Linking

関連論文リスト