MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
- URL: http://arxiv.org/abs/2405.18812v1
- Date: Wed, 29 May 2024 06:55:03 GMT
- Title: MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
- Authors: Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, Xinbo Gao,
- Abstract summary: Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge.
We introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity.
MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity.
- Score: 45.18716166499859
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge in the field of neuroscience research. Compared to merely predicting the viewed image itself, decoding brain activity into meaningful captions provides a higher-level interpretation and summarization of visual information, which naturally enhances the application flexibility in real-world situations. In this work, we introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity. Our MindSemantix explores a more ideal brain captioning paradigm by weaving LLMs into brain activity analysis, crafting a seamless, end-to-end Brain-Language Model. To effectively capture semantic information from brain responses, we propose Brain-Text Transformer, utilizing a Brain Q-Former as its core architecture. It integrates a pre-trained brain encoder with a frozen LLM to achieve multi-modal alignment of brain-vision-language and establish a robust brain-language correspondence. To enhance the generalizability of neural representations, we pre-train our brain encoder on a large-scale, cross-subject fMRI dataset using self-supervised learning techniques. MindSemantix provides more feasibility to downstream brain decoding tasks such as stimulus reconstruction. Conditioned by MindSemantix captioning, our framework facilitates this process by integrating with advanced generative models like Stable Diffusion and excels in understanding brain visual perception. MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity. This approach has demonstrated substantial quantitative improvements over prior art. Our code will be released.
Related papers
- Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models [0.0]
This paper proposes BrainChat, a generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity.
BrainChat implements fMRI question answering and fMRI captioning.
BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.
arXiv Detail & Related papers (2024-06-10T12:06:15Z) - Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction [8.63068449082585]
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
arXiv Detail & Related papers (2024-04-30T10:41:23Z) - Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain
Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps.
We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Explainable fMRI-based Brain Decoding via Spatial Temporal-pyramid Graph
Convolutional Network [0.8399688944263843]
Existing machine learning methods for fMRI-based brain decoding either suffer from low classification performance or poor explainability.
We propose a biologically inspired architecture, Spatial Temporal-pyramid Graph Convolutional Network (STpGCN), to capture the spatial-temporal graph representation of functional brain activities.
We conduct extensive experiments on fMRI data under 23 cognitive tasks from Human Connectome Project (HCP) S1200.
arXiv Detail & Related papers (2022-10-08T12:14:33Z) - Visual representations in the human brain are aligned with large language models [7.779248296336383]
We show that large language models (LLMs) are beneficial for modelling the complex visual information extracted by the brain from natural scenes.
We then train deep neural network models to transform image inputs into LLM representations.
arXiv Detail & Related papers (2022-09-23T17:34:33Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.