Related papers: Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings

Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings

URL: http://arxiv.org/abs/2404.19438v3
Date: Wed, 22 May 2024 17:21:20 GMT
Title: Neuro-Vision to Language: Enhancing Visual Reconstruction and Language Interaction through Brain Recordings
Authors: Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng,
Abstract summary: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
Score: 8.63068449082585
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.

Related papers

BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings [10.966252877363512]
We introduce BrainFLORA, a unified framework for integrating cross-modal neuroimaging data to construct a shared neural representation.<n>Our approach leverages multimodal large language models (MLLMs) augmented with modality-specific adapters and task decoders, achieving state-of-the-art performance in joint-subject visual retrieval.<n>BrainFLORA offers novel implications for cognitive neuroscience and brain-computer interfaces (BCIs)
arXiv Detail & Related papers (2025-07-13T18:56:17Z)
Coherent Language Reconstruction from Brain Recordings with Flexible Multi-Modal Input Stimuli [5.589479682782169]
Decoding thoughts from brain activity offers valuable insights into human cognition and enables promising applications in brain-computer interaction.<n>We propose a unified and flexible framework for reconstructing coherent language from brain recordings elicited by diverse input modalities.
arXiv Detail & Related papers (2025-05-15T14:46:45Z)
Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion [0.11249583407496218]
This paper presents a novel approach towards creating a foundational model for aligning neural data and visual stimuli across multimodal representationsof brain activity by leveraging contrastive learning. We used electroencephalography (EEG), magnetoencephalography (MEG), and functional magnetic resonance imaging (fMRI) data. Our framework's capabilities are demonstrated through three key experiments: decoding visual information from neural data, encoding images into neural representations, and converting between neural modalities.
arXiv Detail & Related papers (2024-11-14T12:27:27Z)
Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps. We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z)
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model [45.18716166499859]
Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge. We introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity. MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity.
arXiv Detail & Related papers (2024-05-29T06:55:03Z)
MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation. Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z)
Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject. We show that our model captures the distinct functionalities of each region of human vision system. Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z)
MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals. Currently, brain decoding is confined to a per-subject-per-model paradigm. We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z)
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z)
Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions. Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline. We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z)
Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding. Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z)
BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model. It bridges the modality gap between brain activity, image, and text. BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z)
Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features [9.783560855840602]
This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features. We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models. In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
arXiv Detail & Related papers (2022-10-13T05:49:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.