Fugu-MT 論文翻訳(概要): Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

論文の概要: Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

arxiv url: http://arxiv.org/abs/2603.11689v1
Date: Thu, 12 Mar 2026 08:56:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-13 14:46:25.977689
Title: Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks
Title（参考訳）: ゼロショットタスクにおけるMLLMの検証と拡張のための明示的論理チャネル
Authors: Mei Chee Leong, Ying Gu, Hui Li Tan, Liyuan Li, Nancy Chen,
Abstract要約: モデル検証,選択,拡張のための論理的明確な論理的推論を行うための明示論理チャネルを提案する。潜在視覚言語知識をカプセル化したフロンティアMLLMは、Implicit Logic Channelとみなすことができる。チャネル間の統合により、明示的な視覚的エビデンスを根拠として、MLLM上のゼロショットタスクのパフォーマンスがさらに向上する。
参考スコア（独自算出の注目度）: 6.788319595251597
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Frontier Multimodal Large Language Models (MLLMs) exhibit remarkable capabilities in Visual-Language Comprehension (VLC) tasks. However, they are often deployed as zero-shot solution to new tasks in a black-box manner. Validating and understanding the behavior of these models become important for application to new task. We propose an Explicit Logic Channel, in parallel with the black-box model channel, to perform explicit logical reasoning for model validation, selection and enhancement. The frontier MLLM, encapsulating latent vision-language knowledge, can be considered as an Implicit Logic Channel. The proposed Explicit Logic Channel, mimicking human logical reasoning, incorporates a LLM, a VFM, and logical reasoning with probabilistic inference for factual, counterfactual, and relational reasoning over the explicit visual evidence. A Consistency Rate (CR) is proposed for cross-channel validation and model selection, even without ground-truth annotations. Additionally, cross-channel integration further improves performance in zero-shot tasks over MLLMs, grounded with explicit visual evidence to enhance trustworthiness. Comprehensive experiments conducted for two representative VLC tasks, i.e., MC-VQA and HC-REC, on three challenging benchmarks, with 11 recent open-source MLLMs from 4 frontier families. Our systematic evaluations demonstrate the effectiveness of proposed ELC and CR for model validation, selection and improvement on MLLMs with enhanced explainability and trustworthiness.
Abstract（参考訳）: Frontier Multimodal Large Language Models (MLLM) はVisual-Language Comprehension (VLC) タスクにおいて顕著な機能を示す。しかし、それらはブラックボックス方式で新しいタスクに対するゼロショットソリューションとしてデプロイされることが多い。これらのモデルの振る舞いを検証し、理解することは、新しいタスクに適用するために重要になる。モデル検証,選択,拡張の論理的推論を行うために,ブラックボックスモデルチャネルと並行して明示論理チャネルを提案する。潜在視覚言語知識をカプセル化したフロンティアMLLMは、Implicit Logic Channelとみなすことができる。提案したExplicit Logic Channelは、人間の論理的推論を模倣し、LLM、VFM、論理的推論を、明示的な視覚的証拠に対する事実的、反実的、関係的推論の確率論的推論に組み込んでいる。提案する一貫性率 (CR) は, 接地トルースアノテーションを使わずに, チャネル間検証とモデル選択のために提案される。さらに、クロスチャネル統合は、信頼性を高めるための明確な視覚的証拠を基盤として、MLLM上のゼロショットタスクのパフォーマンスをさらに向上する。 MC-VQA と HC-REC という2つの代表的な VLC タスクに対する総合的な実験を3つの挑戦的なベンチマークで行った。提案したLCとCRのモデル検証,選択およびMLLMの信頼性向上に有効であることを示す。

論文の概要: Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

関連論文リスト