Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
- URL: http://arxiv.org/abs/2405.03280v1
- Date: Mon, 6 May 2024 08:56:41 GMT
- Title: Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity
- Authors: Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He,
- Abstract summary: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance.
This paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets.
We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model.
- Score: 13.291585611137355
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.
Related papers
- Neural Representations of Dynamic Visual Stimuli [36.04425924379253]
We show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI.
We show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model.
This work offers a novel framework for interpreting how the human brain processes dynamic visual information.
arXiv Detail & Related papers (2024-06-04T17:59:49Z) - MindFormer: A Transformer Architecture for Multi-Subject Brain Decoding via fMRI [50.55024115943266]
We introduce a new Transformer architecture called MindFormer to generate fMRI-conditioned feature vectors.
MindFormer incorporates two key innovations: 1) a novel training strategy based on the IP-Adapter to extract semantically meaningful features from fMRI signals, and 2) a subject specific token and linear layer that effectively capture individual differences in fMRI signals.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Neural 3D decoding for human vision diagnosis [76.41771117405973]
We show how AI can go beyond the current state of the art by advancing from 2D visuals to visually plausible and functionally more comprehensive 3D visuals decoded from brain signals.
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject who was presented with a 2D image, and yields as output the corresponding 3D object visuals.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties [23.893490180665996]
We introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data.
tested on a publicly available fMRI dataset, our method shows promising results.
Our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
arXiv Detail & Related papers (2024-02-02T17:34:25Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion.
In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z) - Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain
Activities [31.448924808940284]
We introduce a two-phase fMRI representation learning framework.
The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations.
The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder.
arXiv Detail & Related papers (2023-05-26T19:16:23Z) - Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.