Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity
- URL: http://arxiv.org/abs/2305.11675v1
- Date: Fri, 19 May 2023 13:44:25 GMT
- Title: Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity
- Authors: Zijiao Chen, Jiaxin Qing, Juan Helen Zhou
- Abstract summary: We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reconstructing human vision from brain activities has been an appealing task
that helps to understand our cognitive process. Even though recent research has
seen great success in reconstructing static images from non-invasive brain
recordings, work on recovering continuous visual experiences in the form of
videos is limited. In this work, we propose Mind-Video that learns
spatiotemporal information from continuous fMRI data of the cerebral cortex
progressively through masked brain modeling, multimodal contrastive learning
with spatiotemporal attention, and co-training with an augmented Stable
Diffusion model that incorporates network temporal inflation. We show that
high-quality videos of arbitrary frame rates can be reconstructed with
Mind-Video using adversarial guidance. The recovered videos were evaluated with
various semantic and pixel-level metrics. We achieved an average accuracy of
85% in semantic classification tasks and 0.19 in structural similarity index
(SSIM), outperforming the previous state-of-the-art by 45%. We also show that
our model is biologically plausible and interpretable, reflecting established
physiological processes.
Related papers
- Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.291585611137355]
Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance.
This paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets.
We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z) - NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties [23.893490180665996]
We introduce NeuroCine, a novel dual-phase framework to targeting the inherent challenges of decoding fMRI data.
tested on a publicly available fMRI dataset, our method shows promising results.
Our attention analysis suggests that the model aligns with existing brain structures and functions, indicating its biological plausibility and interpretability.
arXiv Detail & Related papers (2024-02-02T17:34:25Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - Improving visual image reconstruction from human brain activity using
latent diffusion models via multiple decoded inputs [2.4366811507669124]
Integration of deep learning and neuroscience has led to improvements in the analysis of brain activity.
The reconstruction of visual experience from human brain activity is an area that has particularly benefited.
We examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction.
arXiv Detail & Related papers (2023-06-20T13:48:02Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.