BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models
- URL: http://arxiv.org/abs/2406.07584v1
- Date: Mon, 10 Jun 2024 12:06:15 GMT
- Title: BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models
- Authors: Wanaiu Huang,
- Abstract summary: This paper proposes BrainChat, a generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity.
BrainChat implements fMRI question answering and fMRI captioning.
BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic information is vital for human interaction, and decoding it from brain activity enables non-invasive clinical augmentative and alternative communication. While there has been significant progress in reconstructing visual images, few studies have focused on the language aspect. To address this gap, leveraging the powerful capabilities of the decoder-based vision-language pretrained model CoCa, this paper proposes BrainChat, a simple yet effective generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity, including fMRI question answering and fMRI captioning. BrainChat employs the self-supervised approach of Masked Brain Modeling to encode sparse fMRI data, obtaining a more compact embedding representation in the latent space. Subsequently, BrainChat bridges the gap between modalities by applying contrastive loss, resulting in aligned representations of fMRI, image, and text embeddings. Furthermore, the fMRI embeddings are mapped to the generative Brain Decoder via cross-attention layers, where they guide the generation of textual content about fMRI in a regressive manner by minimizing caption loss. Empirically, BrainChat exceeds the performance of existing state-of-the-art methods in the fMRI captioning task and, for the first time, implements fMRI question answering. Additionally, BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.
Related papers
- MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - DreamCatcher: Revealing the Language of the Brain with fMRI using GPT
Embedding [6.497816402045099]
We propose fMRI captioning, where captions are generated based on fMRI data to gain insight into visual perception.
DreamCatcher consists of the Representation Space (RSE) and the RevEmbedding Decoder, which transform fMRI into a latent space vectors generate captions.
fMRI-based captioning has diverse applications, including understanding neural mechanisms, Human-Computer Interaction, and enhancing learning and training processes.
arXiv Detail & Related papers (2023-06-16T07:55:20Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Mind Reader: Reconstructing complex images from brain activities [16.78619734818198]
We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
arXiv Detail & Related papers (2022-09-30T06:32:46Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.