Fugu-MT 論文翻訳(概要): Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity

論文の概要: Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity

arxiv url: http://arxiv.org/abs/2606.00121v1
Date: Thu, 28 May 2026 09:20:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-02 21:34:27.980579
Title: Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity
Title（参考訳）: 脳活動からの画像再構成のための意味的・構造的ガイダンスを用いたVersatile Framework
Authors: Yizhuo Lu, Changde Du, Qiongyi Zhou, Liuyun Jiang, Huiguang He,
Abstract要約: 我々はMindDiffuserと呼ばれる2段階の画像再構成フレームワークを提案する。ステージ1では、脳反応からデコードされたContrastive Language-Image Pretraining (CLIP)テキスト埋め込みが安定拡散に入力され、予備画像を生成する。ステージ2では、復号された浅いCLIP視覚特徴を監視信号として使用し、ステージ1からバックプロパゲーションを介して特徴ベクトルを反復的に精製して構造情報を整列させる。
参考スコア（独自算出の注目度）: 17.625829377712492
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task in brain decoding. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Recent methods, leveraging advances in the power of text-to-image generation models, have reconstructed images that closely approximate complex natural stimuli in terms of semantics (e.g., concepts and objects). However, they struggle to maintain consistency with the original stimuli in fine-grained structural information (e.g., position, orientation and size), which undermines both the controllability and interpretability of the models. To address the aforementioned issues, we propose a two-stage image reconstruction framework, termed MindDiffuser. In Stage 1, Contrastive Language-Image Pretraining (CLIP) text embeddings decoded from brain responses are input into Stable Diffusion, generating a preliminary image containing semantic information. In Stage 2, we use decoded shallow CLIP visual features as supervisory signals, iteratively refining the feature vectors from Stage 1 via backpropagation to align structural information. We conducted extensive experiments on brain response datasets across three modalities (fMRI, EEG, MEG) elicited by visual stimuli, demonstrating that our framework significantly enhances the performance of previous state-of-the-art models, highlighting the effectiveness and versatility of our approach. Spatial and temporal visualization results further support the neurobiological plausibility of our framework, providing guidance for future neural decoding efforts across different brain signal modalities.
Abstract（参考訳）: 脳の記録から視覚刺激を再構築することは、脳の復号化において有意義で挑戦的な課題である。特に、精密かつ制御可能な画像再構成の達成は、脳-コンピュータインタフェースの進歩と活用を促進する上で非常に重要である。近年の手法は、テキスト・画像生成モデルの能力の進歩を生かして、意味論(概念や対象など)の観点から、複雑な自然刺激を近似したイメージを再構成している。しかし、モデルの制御可能性と解釈可能性の両方を損なうような、きめ細かい構造情報(例えば、位置、方向、サイズ)において、元の刺激との整合性を維持するのに苦労する。上記の課題に対処するため,MindDiffuserと呼ばれる2段階画像再構成フレームワークを提案する。ステージ1では、脳反応からデコードされたContrastive Language-Image Pretraining (CLIP)テキスト埋め込みが安定拡散に入力され、意味情報を含む予備画像を生成する。ステージ2では、復号された浅いCLIP視覚特徴を監視信号として使用し、ステージ1からバックプロパゲーションを介して特徴ベクトルを反復的に精製して構造情報を整列させる。視覚刺激によって誘発される脳反応データセット(fMRI, EEG, MEG)の広範囲な実験を行い、我々のフレームワークは従来の最先端モデルの性能を大幅に向上させ、我々のアプローチの有効性と汎用性を強調した。空間的および時間的可視化の結果は、我々のフレームワークの神経生物学的妥当性をさらに向上させ、脳の信号モダリティにまたがる未来の神経復号へのガイダンスを提供する。

論文の概要: Versatile Framework with Semantic and Structural guidance for Image Reconstruction from Brain Activity

関連論文リスト