NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
- URL: http://arxiv.org/abs/2410.19452v3
- Date: Sun, 15 Dec 2024 08:24:41 GMT
- Title: NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction
- Authors: Zixuan Gong, Guangyin Bao, Qi Zhang, Zhongwei Wan, Duoqian Miao, Shoujin Wang, Lei Zhu, Changwei Wang, Rongtao Xu, Liang Hu, Ke Liu, Yu Zhang,
- Abstract summary: We propose NeuroClips, an innovative framework to decode high-fidelity and smooth video from fMRI.
NeuroClips utilizes a semanticsor to reconstruct videos, guiding semantic accuracy and consistency, and employs a perception reconstructor to capture low-level perceptual details.
NeuroClips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics.
- Score: 29.030311713701295
- License:
- Abstract: Reconstruction of static visual stimuli from non-invasion brain activity fMRI achieves great success, owning to advanced deep learning models such as CLIP and Stable Diffusion. However, the research on fMRI-to-video reconstruction remains limited since decoding the spatiotemporal perception of continuous visual experiences is formidably challenging. We contend that the key to addressing these challenges lies in accurately decoding both high-level semantics and low-level perception flows, as perceived by the brain in response to video stimuli. To the end, we propose NeuroClips, an innovative framework to decode high-fidelity and smooth video from fMRI. NeuroClips utilizes a semantics reconstructor to reconstruct video keyframes, guiding semantic accuracy and consistency, and employs a perception reconstructor to capture low-level perceptual details, ensuring video smoothness. During inference, it adopts a pre-trained T2V diffusion model injected with both keyframes and low-level perception flows for video reconstruction. Evaluated on a publicly available fMRI-video dataset, NeuroClips achieves smooth high-fidelity video reconstruction of up to 6s at 8FPS, gaining significant improvements over state-of-the-art models in various metrics, e.g., a 128% improvement in SSIM and an 81% improvement in spatiotemporal metrics. Our project is available at https://github.com/gongzix/NeuroClips.
Related papers
- ContextMRI: Enhancing Compressed Sensing MRI through Metadata Conditioning [51.26601171361753]
We propose ContextMRI, a text-conditioned diffusion model for MRI that integrates granular metadata into the reconstruction process.
We show that increasing the fidelity of metadata, ranging from slice location and contrast to patient age, sex, and pathology, systematically boosts reconstruction performance.
arXiv Detail & Related papers (2025-01-08T05:15:43Z) - MambaRecon: MRI Reconstruction with Structured State Space Models [30.506544165999564]
The advent of deep learning has catalyzed the development of cutting-edge methods for the expedited reconstruction of MRI scans.
We propose an innovative MRI reconstruction framework that employs structured state space models at its core, aimed at amplifying both long-range contextual sensitivity and reconstruction efficacy.
arXiv Detail & Related papers (2024-09-19T01:50:10Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.04953215936574]
We propose a two-stage model named Mind-Animator to reconstruct human dynamic vision from brain activity.
During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI.
In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training.
We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue.
We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch.
This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z) - NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties [23.893490180665996]
We introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data.
tested on a publicly available fMRI dataset, our method shows promising results.
Our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
arXiv Detail & Related papers (2024-02-02T17:34:25Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Progressive Fourier Neural Representation for Sequential Video
Compilation [75.43041679717376]
Motivated by continual learning, this work investigates how to accumulate and transfer neural implicit representations for multiple complex video data over sequential encoding sessions.
We propose a novel method, Progressive Fourier Neural Representation (PFNR), that aims to find an adaptive and compact sub-module in Fourier space to encode videos in each training session.
We validate our PFNR method on the UVG8/17 and DAVIS50 video sequence benchmarks and achieve impressive performance gains over strong continual learning baselines.
arXiv Detail & Related papers (2023-06-20T06:02:19Z) - Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z) - NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry
Scaffolds [60.1382112938132]
We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room.
NeRF achieves impressive performance in rendering images for novel views similar to the input views while suffering for novel views that are significantly different from the training views.
arXiv Detail & Related papers (2023-04-13T06:40:08Z) - Natural scene reconstruction from fMRI signals using generative latent
diffusion [1.90365714903665]
We present a two-stage scene reconstruction framework called Brain-Diffuser''
In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model.
In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
arXiv Detail & Related papers (2023-03-09T15:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.