Fugu-MT 論文翻訳(概要): Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

論文の概要: Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

arxiv url: http://arxiv.org/abs/2509.21789v1
Date: Fri, 26 Sep 2025 02:43:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.136594
Title: Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Title（参考訳）: 視覚多エージェントシステム:ビジュアルフローによる幻覚雪球の緩和
Authors: Xinlei Yu, Chengming Xu, Guibin Zhang, Yongbo He, Zhangquan Chen, Zhucun Xue, Jiangning Zhang, Yue Liao, Xiaobin Hu, Yu-Gang Jiang, Shuicheng Yan,
Abstract要約: 視覚言語モデル (VLM) を利用したマルチエージェントシステム (MAS) は, 難易度の高いタスクを実現するが, 新たな障害項である視覚幻覚スノーボールに悩まされる。本研究では,視覚的注意配分の低減に関して,幻覚雪球の本質に関する詳細な知見を提供する。選択した視覚的リレートークンをベースとしたビジュアルフローとエージェント間メッセージを中継する軽量なプラグアンドプレイ緩和パラダイムであるViFを提案し,このパターンを増幅するために注目位置を適用した。
参考スコア（独自算出の注目度）: 99.54291580187817
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code will be available at: https://github.com/YU-deep/ViF.git.
Abstract（参考訳）: 視覚言語モデル (VLM) を利用したマルチエージェントシステム (MAS) は, 難易度の高いタスクを実現できるが, 視覚情報を伝達するためのテキストフローの過度な信頼性のため, 単一のエージェントで幻覚をシードし, 後続のエージェントによって増幅する, 新たな障害項であるマルチエージェント視覚幻覚スノーボール (multi-agent visual hallucination snowballing) に悩まされる。旋回, 層状, トークン的注意分析を通じて, 視覚的注意配分の低減に関する幻覚的雪玉形成の本質について, 詳細な知見を提供する。視覚トークンのサブセットを中層で一様注意ピークで識別し、視覚的証拠を最もよく保存するが、より深いエージェントターンでは徐々に減少し、MASでは視覚幻覚の雪玉が生じる。そこで本稿では,選択した視覚的リレートークンをベースとして,エージェント間メッセージをビジュアルフローで中継する,軽量なプラグアンドプレイ緩和パラダイムであるViFを提案する。実験の結果,本手法は,4つの共通MAS構造と10つのベースモデルに基づく8つのベンチマークで連続的に性能を向上し,ハロゲン化雪を著しく低減することが示された。ソースコードは、https://github.com/YU-deep/ViF.git.comで入手できる。

論文の概要: Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow

関連論文リスト