Related papers: Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

URL: http://arxiv.org/abs/2405.03280v2
Date: Wed, 19 Feb 2025 05:02:08 GMT
Title: Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Xujin Li, Huiguang He,
Abstract summary: We propose a two-stage model named Mind-Animator to reconstruct human dynamic vision from brain activity.<n>During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI.<n>In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion.
Score: 13.04953215936574
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. Although prior video reconstruction methods have made substantial progress, they still suffer from several limitations, including: (1) difficulty in simultaneously reconciling semantic (e.g. categorical descriptions), structure (e.g. size and color), and consistent motion information (e.g. order of frames); (2) low temporal resolution of fMRI, which poses a challenge in decoding multiple frames of video dynamics from a single fMRI frame; (3) reliance on video generation models, which introduces ambiguity regarding whether the dynamics observed in the reconstructed videos are genuinely derived from fMRI data or are hallucinations from generative model. To overcome these limitations, we propose a two-stage model named Mind-Animator. During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI. Specifically, we employ fMRI-vision-language tri-modal contrastive learning to decode semantic feature from fMRI and design a sparse causal attention mechanism for decoding multi-frame video motion features through a next-frame-prediction task. In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion, effectively eliminating external video data interference. Extensive experiments on multiple video-fMRI datasets demonstrate that our model achieves state-of-the-art performance. Comprehensive visualization analyses further elucidate the interpretability of our model from a neurobiological perspective. Project page: https://mind-animator-design.github.io/.

Related papers

SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance [52.34513874272676]
We introduce SemVideo, a novel fMRI-to-video reconstruction framework guided by hierarchical semantic information.<n>At the core of SemVideo is SemMiner, a hierarchical guidance module that constructs three levels of semantic cues from the original video stimulus.<n>We show that SemVideo achieves superior performance in both semantic alignment and temporal consistency, setting a new state-of-the-art in fMRI-to-video reconstruction.
arXiv Detail & Related papers (2026-02-25T11:47:09Z)
DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion [10.936858717759156]
We introduce DynaMind, a novel framework that reconstructs video by jointly modeling neural dynamics and semantic features.<n>On the SEED-DV dataset, DynaMind sets a new state-of-the-art (SOTA), boosting reconstructed video accuracies by 12.5 and 10.3 percentage points.<n>This marks a critical advancement, bridging the gap between neural dynamics and high-fidelity visual semantics.
arXiv Detail & Related papers (2025-09-01T06:52:08Z)
MindShot: Multi-Shot Video Reconstruction from fMRI with LLM Decoding [7.066210443745838]
We propose a novel divide-and-decode framework for multi-shot fMRI video reconstruction.<n>Our core innovations are: (1) A shot boundary predictor module explicitly decomposing mixed fMRI signals into shot-specific segments.<n> (2) Generative captioning using LLMs, which decodes robust textual descriptions from each segment, overcoming temporal blur by leveraging high-level semantics.
arXiv Detail & Related papers (2025-08-04T14:47:17Z)
DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding [82.91021399231184]
Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. We propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals. It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video.
arXiv Detail & Related papers (2025-04-01T05:28:37Z)
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction [13.110669865114533]
NEURONS is a concept framework that decouples learning into four correlated sub-tasks. It simulates the visual cortex's functional specialization, allowing the model to capture diverse video content. NEURONS shows a strong functional correlation with the visual cortex, highlighting its potential for brain-computer interfaces and clinical applications.
arXiv Detail & Related papers (2025-03-14T08:12:28Z)
NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction [29.030311713701295]
We propose NeuroClips, an innovative framework to decode high-fidelity and smooth video from fMRI. NeuroClips utilizes a semanticsor to reconstruct videos, guiding semantic accuracy and consistency, and employs a perception reconstructor to capture low-level perceptual details. NeuroClips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics.
arXiv Detail & Related papers (2024-10-25T10:28:26Z)
Neural Representations of Dynamic Visual Stimuli [36.04425924379253]
We show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI. We show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model. This work offers a novel framework for interpreting how the human brain processes dynamic visual information.
arXiv Detail & Related papers (2024-06-04T17:59:49Z)
MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation. Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z)
Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject. We show that our model captures the distinct functionalities of each region of human vision system. Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z)
Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model. We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE) This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z)
MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals. Currently, brain decoding is confined to a per-subject-per-model paradigm. We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z)
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties [23.893490180665996]
We introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data. tested on a publicly available fMRI dataset, our method shows promising results. Our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
arXiv Detail & Related papers (2024-02-02T17:34:25Z)
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z)
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities [31.448924808940284]
We introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder.
arXiv Detail & Related papers (2023-05-26T19:16:23Z)
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance. We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z)
Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding. Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z)
Natural scene reconstruction from fMRI signals using generative latent diffusion [1.90365714903665]
We present a two-stage scene reconstruction framework called Brain-Diffuser'' In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
arXiv Detail & Related papers (2023-03-09T15:24:26Z)
Exploring Motion and Appearance Information for Temporal Sentence Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding. We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations. Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z)
High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion. We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.