Natural scene reconstruction from fMRI signals using generative latent
diffusion
- URL: http://arxiv.org/abs/2303.05334v2
- Date: Wed, 21 Jun 2023 07:15:19 GMT
- Title: Natural scene reconstruction from fMRI signals using generative latent
diffusion
- Authors: Furkan Ozcelik and Rufin VanRullen
- Abstract summary: We present a two-stage scene reconstruction framework called Brain-Diffuser''
In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model.
In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
- Score: 1.90365714903665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In neural decoding research, one of the most intriguing topics is the
reconstruction of perceived natural images based on fMRI signals. Previous
studies have succeeded in re-creating different aspects of the visuals, such as
low-level properties (shape, texture, layout) or high-level features (category
of objects, descriptive semantics of scenes) but have typically failed to
reconstruct these properties together for complex scene images. Generative AI
has recently made a leap forward with latent diffusion models capable of
generating high-complexity images. Here, we investigate how to take advantage
of this innovative technology for brain decoding. We present a two-stage scene
reconstruction framework called ``Brain-Diffuser''. In the first stage,
starting from fMRI signals, we reconstruct images that capture low-level
properties and overall layout using a VDVAE (Very Deep Variational Autoencoder)
model. In the second stage, we use the image-to-image framework of a latent
diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text
and visual) features, to generate final reconstructed images. On the publicly
available Natural Scenes Dataset benchmark, our method outperforms previous
models both qualitatively and quantitatively. When applied to synthetic fMRI
patterns generated from individual ROI (region-of-interest) masks, our trained
model creates compelling ``ROI-optimal'' scenes consistent with neuroscientific
knowledge. Thus, the proposed methodology can have an impact on both applied
(e.g. brain-computer interface) and fundamental neuroscience.
Related papers
- MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion.
In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [8.299415606889024]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion.
In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information.
arXiv Detail & Related papers (2023-03-24T16:41:42Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Facial Image Reconstruction from Functional Magnetic Resonance Imaging
via GAN Inversion with Improved Attribute Consistency [5.705640492618758]
We propose a new framework to reconstruct facial images from fMRI data.
The proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
arXiv Detail & Related papers (2022-07-03T11:18:35Z) - Reconstruction of Perceived Images from fMRI Patterns and Semantic Brain
Exploration using Instance-Conditioned GANs [1.6904374000330984]
We use an Instance-Conditioned GAN (IC-GAN) model to reconstruct images from fMRI patterns with both accurate semantic attributes and preserved low-level details.
We trained ridge regression models to predict instance features, noise vectors, and dense vectors of stimuli from corresponding fMRI patterns.
Then, we used the IC-GAN generator to reconstruct novel test images based on these fMRI-predicted variables.
arXiv Detail & Related papers (2022-02-25T13:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.