Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction
- URL: http://arxiv.org/abs/2404.19438v4
- Date: Mon, 14 Oct 2024 09:23:48 GMT
- Title: Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction
- Authors: Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng,
- Abstract summary: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
- Score: 8.63068449082585
- License:
- Abstract: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.
Related papers
- LLM4Brain: Training a Large Language Model for Brain Video Understanding [9.294352205183726]
We introduce an LLM-based approach for reconstructing visual-semantic information from fMRI signals elicited by video stimuli.
We employ fine-tuning techniques on an fMRI encoder equipped with adaptors to transform brain responses into latent representations aligned with the video stimuli.
In particular, we integrate self-supervised domain adaptation methods to enhance the alignment between visual-semantic information and brain responses.
arXiv Detail & Related papers (2024-09-26T15:57:08Z) - BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models [0.0]
This paper proposes BrainChat, a generative framework aimed at rapidly accomplishing semantic information decoding tasks from brain activity.
BrainChat implements fMRI question answering and fMRI captioning.
BrainChat is highly flexible and can achieve high performance without image data, making it better suited for real-world scenarios with limited data.
arXiv Detail & Related papers (2024-06-10T12:06:15Z) - MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model [45.18716166499859]
Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge.
We introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity.
MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity.
arXiv Detail & Related papers (2024-05-29T06:55:03Z) - MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer.
This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation.
Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.
We show that our model captures the distinct functionalities of each region of human vision system.
Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Decoding Visual Neural Representations by Multimodal Learning of
Brain-Visual-Linguistic Features [9.783560855840602]
This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features.
We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models.
In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
arXiv Detail & Related papers (2022-10-13T05:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.