Fugu-MT 論文翻訳(概要): Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

論文の概要: Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

arxiv url: http://arxiv.org/abs/2605.18172v1
Date: Mon, 18 May 2026 10:15:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.388148
Title: Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs
Title（参考訳）: 見えない視界を可視化する: MLLMのユニバーサル脳波理解を支援する生成的ビジュアルグラウンド
Authors: Junyu Pan, Yansen Wang, Enze Zhang, Baoliang Lu, Weilong Zheng, Dongsheng Li,
Abstract要約: 生成的ビジュアルグラウンド(GVG)は、脳波から画像への生成モデルを視覚翻訳器として使用することにより、見えないものを可視化する。 GVGは、非視覚的脳波のインスタンス固有のプロキシイメージを幻覚させ、MLLMが臨床状態の解釈のために視覚的優位性を利用することを可能にする。
参考スコア（独自算出の注目度）: 41.987753428905734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Leveraging the universal representations of pre-trained LLMs and MLLMs offers a promising path toward brain foundation models. However, visually-evoked EEG datasets remain scarce, leading existing methods to align neural signals mainly with abstract text, a lossy translation that may discard fine-grained perceptual information encoded in brain activity. We propose Generative Visual Grounding (GVG), a framework that visualizes the invisible by using an EEG-to-image generative model as a visual translator. Instead of forcing EEG into text alone, GVG hallucinates instance-specific proxy images for non-visual EEG, providing structured visual contexts that allow MLLMs to exploit their visual priors for clinical-state interpretation. We validate this idea on two MLLM backbones, GVG-X-Omni and GVG-Janus. Image-only alignment is already competitive: the lightweight GVG-X-Omni matches 1.7B-parameter text-aligned baselines while tuning only 170M parameters on a frozen 7B backbone. We further extend GVG-Janus with trimodal Image+Text alignment, where text supplies categorical semantic anchors and visual proxies enrich neural representations with perceptual details. Experiments show consistent gains in EEG understanding and visual generation, suggesting visual proxy grounding as an effective complement to textual alignment.
Abstract（参考訳）: 事前訓練されたLLMとMLLMの普遍的な表現を活用することは、脳基盤モデルへの有望な道のりを提供する。しかし、視覚的に誘発される脳波データセットは依然として乏しく、神経信号を主に抽象テキストと整合させる既存の手法は、脳の活動で符号化された微細な知覚情報を破棄する可能性がある。本稿では,脳波から画像への生成モデルを視覚トランスレータとして利用することにより,視覚を可視化するフレームワークである生成視覚グラウンド(GVG)を提案する。脳波のみをテキストに強制するのではなく、GVGは非視覚脳波のインスタンス固有のプロキシイメージを幻覚させ、MLLMが臨床状態の解釈のために視覚的先行を活用できるように構造化された視覚的コンテキストを提供する。 GVG-X-OmniとGVG-Janusの2つのMLLMバックボーン上でこの考え方を検証する。軽量のGVG-X-Omniは1.7Bパラメータのテキスト整列ベースラインと一致し、凍結した7Bバックボーン上では170Mパラメータしか調整できない。我々はさらに、GVG-Janusをトリモーダル画像+テキストアライメントで拡張し、テキストがカテゴリー的なセマンティックアンカーと視覚プロキシを供給し、知覚的詳細で神経表現を豊かにする。実験では、脳波の理解と視覚生成が一貫した向上を示し、視覚的プロキシグラウンドがテキストアライメントの効果的な補完となることを示唆している。

論文の概要: Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs

関連論文リスト