DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding
- URL: http://arxiv.org/abs/2504.00432v1
- Date: Tue, 01 Apr 2025 05:28:37 GMT
- Title: DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding
- Authors: Chong Li, Jingyang Huo, Weikang Gong, Yanwei Fu, Xiangyang Xue, Jianfeng Feng,
- Abstract summary: Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information.<n>We propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals.<n>It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video.
- Score: 82.91021399231184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding visual experiences from brain activity is a significant challenge. Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. However, these aspects are all essential and are processed through distinct pathways in the brain. Motivated by this, we propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals. It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video. This approach not only simplifies the complex task of video decoding by decomposing it into manageable sub-tasks, but also establishes a clearer connection between learned representations and their biological counterpart, as supported by ablation studies. Further, our experiments show significant improvements over previous state-of-the-art methods, achieving 82.4% accuracy for semantic classification, 70.6% accuracy in spatial consistency, a 0.212 cosine similarity for motion prediction, and 21.9% 50-way accuracy for video generation. Additionally, neural encoding analyses for semantic and spatial information align with the two-streams hypothesis, further validating the distinct roles of the ventral and dorsal pathways. Overall, DecoFuse provides a strong and biologically plausible framework for fMRI-to-video decoding. Project page: https://chongjg.github.io/DecoFuse/.
Related papers
- Deep Neural Encoder-Decoder Model to Relate fMRI Brain Activity with Naturalistic Stimuli [2.7149743794003913]
We propose an end-to-end deep neural encoder-decoder model to encode and decode brain activity in response to naturalistic stimuli.<n>We employ temporal convolutional layers in our architecture, which effectively allows to bridge the temporal resolution gap between natural movie stimuli and fMRI.
arXiv Detail & Related papers (2025-07-16T08:08:48Z) - Perception Activator: An intuitive and portable framework for brain cognitive exploration [19.851643249367108]
We develop an experimental framework that uses fMRI representations as intervention conditions.<n>We compare both downstream performance and intermediate feature changes on object detection and instance segmentation tasks with and without fMRI information.<n>Our results prove that fMRI contains rich multi-object semantic cues and coarse spatial localization information-elements that current models have yet to fully exploit or integrate.
arXiv Detail & Related papers (2025-07-03T04:46:48Z) - Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction [13.110669865114533]
NEURONS is a concept framework that decouples learning into four correlated sub-tasks.<n>It simulates the visual cortex's functional specialization, allowing the model to capture diverse video content.<n>NEURONS shows a strong functional correlation with the visual cortex, highlighting its potential for brain-computer interfaces and clinical applications.
arXiv Detail & Related papers (2025-03-14T08:12:28Z) - MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data [64.92867794764247]
MindAligner is a framework for cross-subject brain decoding from limited fMRI data.
Brain Transfer Matrix (BTM) projects the brain signals of an arbitrary new subject to one of the known subjects.
Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli.
arXiv Detail & Related papers (2025-02-07T16:01:59Z) - Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.04953215936574]
We propose a two-stage model named Mind-Animator to reconstruct human dynamic vision from brain activity.<n>During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI.<n>In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Mind Reader: Reconstructing complex images from brain activities [16.78619734818198]
We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
arXiv Detail & Related papers (2022-09-30T06:32:46Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.