Decoding Realistic Images from Brain Activity with Contrastive
Self-supervision and Latent Diffusion
- URL: http://arxiv.org/abs/2310.00318v1
- Date: Sat, 30 Sep 2023 09:15:22 GMT
- Title: Decoding Realistic Images from Brain Activity with Contrastive
Self-supervision and Latent Diffusion
- Authors: Jingyuan Sun, Mingxiao Li, Marie-Francine Moens
- Abstract summary: Reconstructing visual stimuli from human brain activities provides a promising opportunity to advance our understanding of the brain's visual system.
We propose a two-phase framework named Contrast and Diffuse (CnD) to decode realistic images from functional magnetic resonance imaging (fMRI) recordings.
- Score: 29.335943994256052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing visual stimuli from human brain activities provides a
promising opportunity to advance our understanding of the brain's visual system
and its connection with computer vision models. Although deep generative models
have been employed for this task, the challenge of generating high-quality
images with accurate semantics persists due to the intricate underlying
representations of brain signals and the limited availability of parallel data.
In this paper, we propose a two-phase framework named Contrast and Diffuse
(CnD) to decode realistic images from functional magnetic resonance imaging
(fMRI) recordings. In the first phase, we acquire representations of fMRI data
through self-supervised contrastive learning. In the second phase, the encoded
fMRI representations condition the diffusion model to reconstruct visual
stimulus through our proposed concept-aware conditioning method. Experimental
results show that CnD reconstructs highly plausible images on challenging
benchmarks. We also provide a quantitative interpretation of the connection
between the latent diffusion model (LDM) components and the human brain's
visual system. In summary, we present an effective approach for reconstructing
visual stimuli based on human brain activity and offer a novel framework to
understand the relationship between the diffusion model and the human brain
visual system.
Related papers
- Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance [3.74142789780782]
We show how modern LDMs incorporate multi-modal guidance for structurally and semantically plausible image generations.
Brain-Streams maps fMRI signals from brain regions to appropriate embeddings.
We validate the reconstruction ability of Brain-Streams both quantitatively and qualitatively on a real fMRI dataset.
arXiv Detail & Related papers (2024-09-18T16:19:57Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain
Activities [31.448924808940284]
We introduce a two-phase fMRI representation learning framework.
The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations.
The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder.
arXiv Detail & Related papers (2023-05-26T19:16:23Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked
Modeling for Vision Decoding [0.0]
We present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding.
We show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations.
arXiv Detail & Related papers (2022-11-13T17:04:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.