BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding
- URL: http://arxiv.org/abs/2302.12971v3
- Date: Mon, 15 May 2023 04:32:59 GMT
- Title: BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding
- Authors: Yulong Liu, Yongqiang Ma, Wei Zhou, Guibo Zhu, Nanning Zheng
- Abstract summary: BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
- Score: 51.911473457195555
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Due to the lack of paired samples and the low signal-to-noise ratio of
functional MRI (fMRI) signals, reconstructing perceived natural images or
decoding their semantic contents from fMRI data are challenging tasks. In this
work, we propose, for the first time, a task-agnostic fMRI-based brain decoding
model, BrainCLIP, which leverages CLIP's cross-modal generalization ability to
bridge the modality gap between brain activity, image, and text. Our
experiments demonstrate that CLIP can act as a pivot for generic brain decoding
tasks, including zero-shot visual categories decoding, fMRI-image/text
matching, and fMRI-to-image generation. Specifically, BrainCLIP aims to train a
mapping network that transforms fMRI patterns into a well-aligned CLIP
embedding space by combining visual and textual supervision. Our experiments
show that this combination can boost the decoding model's performance on
certain tasks like fMRI-text matching and fMRI-to-image generation. On the
zero-shot visual category decoding task, BrainCLIP achieves significantly
better performance than BraVL, a recently proposed multi-modal method
specifically designed for this task. BrainCLIP can also reconstruct visual
stimuli with high semantic fidelity and establishes a new state-of-the-art for
fMRI-based natural image reconstruction in terms of high-level semantic
features.
Related papers
- BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models [0.0]
This paper proposes BrainChat, a generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity.
BrainChat implements fMRI question answering and fMRI captioning.
BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.
arXiv Detail & Related papers (2024-06-10T12:06:15Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - DreamCatcher: Revealing the Language of the Brain with fMRI using GPT
Embedding [6.497816402045099]
We propose fMRI captioning, where captions are generated based on fMRI data to gain insight into visual perception.
DreamCatcher consists of the Representation Space (RSE) and the RevEmbedding Decoder, which transform fMRI into a latent space vectors generate captions.
fMRI-based captioning has diverse applications, including understanding neural mechanisms, Human-Computer Interaction, and enhancing learning and training processes.
arXiv Detail & Related papers (2023-06-16T07:55:20Z) - Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain
Activities [31.448924808940284]
We introduce a two-phase fMRI representation learning framework.
The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations.
The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder.
arXiv Detail & Related papers (2023-05-26T19:16:23Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Mind Reader: Reconstructing complex images from brain activities [16.78619734818198]
We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
arXiv Detail & Related papers (2022-09-30T06:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.