MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion
- URL: http://arxiv.org/abs/2303.14139v1
- Date: Fri, 24 Mar 2023 16:41:42 GMT
- Title: MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion
- Authors: Yizhuo Lu, Changde Du, Dianpeng Wang and Huiguang He
- Abstract summary: We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion.
In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information.
- Score: 8.299415606889024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing visual stimuli from measured functional magnetic resonance
imaging (fMRI) has been a meaningful and challenging task. Previous studies
have successfully achieved reconstructions with structures similar to the
original images, such as the outlines and size of some natural images. However,
these reconstructions lack explicit semantic information and are difficult to
discern. In recent years, many studies have utilized multi-modal pre-trained
models with stronger generative capabilities to reconstruct images that are
semantically similar to the original ones. However, these images have
uncontrollable structural information such as position and orientation. To
address both of the aforementioned issues simultaneously, we propose a
two-stage image reconstruction model called MindDiffuser, utilizing Stable
Diffusion. In Stage 1, the VQ-VAE latent representations and the CLIP text
embeddings decoded from fMRI are put into the image-to-image process of Stable
Diffusion, which yields a preliminary image that contains semantic and
structural information. In Stage 2, we utilize the low-level CLIP visual
features decoded from fMRI as supervisory information, and continually adjust
the two features in Stage 1 through backpropagation to align the structural
information. The results of both qualitative and quantitative analyses
demonstrate that our proposed model has surpassed the current state-of-the-art
models in terms of reconstruction results on Natural Scenes Dataset (NSD).
Furthermore, the results of ablation experiments indicate that each component
of our model is effective for image reconstruction.
Related papers
- Double-Flow GAN model for the reconstruction of perceived faces from brain activities [13.707575848841405]
We proposed a novel reconstruction framework, which we called Double-Flow GAN.
We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI.
Results showed that the proposed method is significant at accurately reconstructing multiple face attributes, outperforms the previous reconstruction models, and exhibited state-of-the-art reconstruction abilities.
arXiv Detail & Related papers (2023-12-12T18:07:57Z) - DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks.
DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content.
In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results.
For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion.
In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z) - Natural scene reconstruction from fMRI signals using generative latent
diffusion [1.90365714903665]
We present a two-stage scene reconstruction framework called Brain-Diffuser''
In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model.
In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
arXiv Detail & Related papers (2023-03-09T15:24:26Z) - Model-Guided Multi-Contrast Deep Unfolding Network for MRI
Super-resolution Reconstruction [68.80715727288514]
We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix.
In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction.
arXiv Detail & Related papers (2022-09-15T03:58:30Z) - Facial Image Reconstruction from Functional Magnetic Resonance Imaging
via GAN Inversion with Improved Attribute Consistency [5.705640492618758]
We propose a new framework to reconstruct facial images from fMRI data.
The proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
arXiv Detail & Related papers (2022-07-03T11:18:35Z) - Federated Learning of Generative Image Priors for MRI Reconstruction [5.3963856146595095]
Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data.
We introduce a novel method for MRI reconstruction based on Federated learning of Generative IMage Priors (FedGIMP)
FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and subject-specific injection of the imaging operator.
arXiv Detail & Related papers (2022-02-08T22:17:57Z) - Multi-modal Aggregation Network for Fast MR Imaging [85.25000133194762]
We propose a novel Multi-modal Aggregation Network, named MANet, which is capable of discovering complementary representations from a fully sampled auxiliary modality.
In our MANet, the representations from the fully sampled auxiliary and undersampled target modalities are learned independently through a specific network.
Our MANet follows a hybrid domain learning framework, which allows it to simultaneously recover the frequency signal in the $k$-space domain.
arXiv Detail & Related papers (2021-10-15T13:16:59Z) - Multi-institutional Collaborations for Improving Deep Learning-based
Magnetic Resonance Image Reconstruction Using Federated Learning [62.17532253489087]
Deep learning methods have been shown to produce superior performance on MR image reconstruction.
These methods require large amounts of data which is difficult to collect and share due to the high cost of acquisition and medical data privacy regulations.
We propose a federated learning (FL) based solution in which we take advantage of the MR data available at different institutions while preserving patients' privacy.
arXiv Detail & Related papers (2021-03-03T03:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.