Controllable Mind Visual Diffusion Model
- URL: http://arxiv.org/abs/2305.10135v3
- Date: Mon, 18 Dec 2023 09:09:28 GMT
- Title: Controllable Mind Visual Diffusion Model
- Authors: Bohan Zeng, Shanglin Li, Xuhui Liu, Sicheng Gao, Xiaolong Jiang, Xu
Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang
- Abstract summary: Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
- Score: 58.83896307930354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Brain signal visualization has emerged as an active research area, serving as
a critical interface between the human visual system and computer vision
models. Although diffusion models have shown promise in analyzing functional
magnetic resonance imaging (fMRI) data, including reconstructing high-quality
images consistent with original visual stimuli, their accuracy in extracting
semantic and silhouette information from brain signals remains limited. In this
regard, we propose a novel approach, referred to as Controllable Mind Visual
Diffusion Model (CMVDM). CMVDM extracts semantic and silhouette information
from fMRI data using attribute alignment and assistant networks. Additionally,
a residual block is incorporated to capture information beyond semantic and
silhouette features. We then leverage a control model to fully exploit the
extracted information for image synthesis, resulting in generated images that
closely resemble the visual stimuli in terms of semantics and silhouette.
Through extensive experimentation, we demonstrate that CMVDM outperforms
existing state-of-the-art methods both qualitatively and quantitatively.
Related papers
- Brain-Streams: fMRI-to-Image Reconstruction with Multi-modal Guidance [3.74142789780782]
We show how modern LDMs incorporate multi-modal guidance for structurally and semantically plausible image generations.
Brain-Streams maps fMRI signals from brain regions to appropriate embeddings.
We validate the reconstruction ability of Brain-Streams both quantitatively and qualitatively on a real fMRI dataset.
arXiv Detail & Related papers (2024-09-18T16:19:57Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.
We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z) - Diffusion Model Based Visual Compensation Guidance and Visual Difference
Analysis for No-Reference Image Quality Assessment [82.13830107682232]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships.
We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images.
Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - Decoding Realistic Images from Brain Activity with Contrastive
Self-supervision and Latent Diffusion [29.335943994256052]
Reconstructing visual stimuli from human brain activities provides a promising opportunity to advance our understanding of the brain's visual system.
We propose a two-phase framework named Contrast and Diffuse (CnD) to decode realistic images from functional magnetic resonance imaging (fMRI) recordings.
arXiv Detail & Related papers (2023-09-30T09:15:22Z) - MindDiffuser: Controlled Image Reconstruction from Human Brain Activity
with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser.
In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion.
In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked
Modeling for Vision Decoding [0.0]
We present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding.
We show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations.
arXiv Detail & Related papers (2022-11-13T17:04:05Z) - Facial Image Reconstruction from Functional Magnetic Resonance Imaging
via GAN Inversion with Improved Attribute Consistency [5.705640492618758]
We propose a new framework to reconstruct facial images from fMRI data.
The proposed framework accomplishes two goals: (1) reconstructing clear facial images from fMRI data and (2) maintaining the consistency of semantic characteristics.
arXiv Detail & Related papers (2022-07-03T11:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.