Mind Reader: Reconstructing complex images from brain activities
- URL: http://arxiv.org/abs/2210.01769v1
- Date: Fri, 30 Sep 2022 06:32:46 GMT
- Title: Mind Reader: Reconstructing complex images from brain activities
- Authors: Sikun Lin, Thomas Sprague, Ambuj K Singh
- Abstract summary: We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
- Score: 16.78619734818198
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding how the brain encodes external stimuli and how these stimuli
can be decoded from the measured brain activities are long-standing and
challenging questions in neuroscience. In this paper, we focus on
reconstructing the complex image stimuli from fMRI (functional magnetic
resonance imaging) signals. Unlike previous works that reconstruct images with
single objects or simple shapes, our work aims to reconstruct image stimuli
that are rich in semantics, closer to everyday scenes, and can reveal more
perspectives. However, data scarcity of fMRI datasets is the main obstacle to
applying state-of-the-art deep learning models to this problem. We find that
incorporating an additional text modality is beneficial for the reconstruction
problem compared to directly translating brain signals to images. Therefore,
the modalities involved in our method are: (i) voxel-level fMRI signals, (ii)
observed images that trigger the brain signals, and (iii) textual description
of the images. To further address data scarcity, we leverage an aligned
vision-language latent space pre-trained on massive datasets. Instead of
training models from scratch to find a latent space shared by the three
modalities, we encode fMRI signals into this pre-aligned latent space. Then,
conditioned on embeddings in this space, we reconstruct images with a
generative model. The reconstructed images from our pipeline balance both
naturalness and fidelity: they are photo-realistic and capture the ground truth
image contents well.
Related papers
- Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance [3.74142789780782]
We show how modern LDMs incorporate multi-modal guidance for structurally and semantically plausible image generations.
Brain-Streams maps fMRI signals from brain regions to appropriate embeddings.
We validate the reconstruction ability of Brain-Streams both quantitatively and qualitatively on a real fMRI dataset.
arXiv Detail & Related papers (2024-09-18T16:19:57Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.
We show that our model captures the distinct functionalities of each region of human vision system.
Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing [72.45257414889478]
We aim to reduce human workload by predicting connectivity between over-segmented neuron pieces.
We first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain.
We propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding.
arXiv Detail & Related papers (2024-01-05T19:45:12Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Facial Image Reconstruction from Functional Magnetic Resonance Imaging
via GAN Inversion with Improved Attribute Consistency [5.705640492618758]
We propose a new framework to reconstruct facial images from fMRI data.
The proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
arXiv Detail & Related papers (2022-07-03T11:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.