UMBRAE: Unified Multimodal Brain Decoding
- URL: http://arxiv.org/abs/2404.07202v2
- Date: Thu, 18 Jul 2024 12:30:48 GMT
- Title: UMBRAE: Unified Multimodal Brain Decoding
- Authors: Weihao Xia, Raoul de Charette, Cengiz Ă–ztireli, Jing-Hao Xue,
- Abstract summary: We propose UMBRAE, a unified multimodal decoding of brain signals.
We introduce an efficient universal brain encoder for multimodal-brain alignment.
We also introduce a cross-subject training strategy mapping subject-specific features to a common feature space.
- Score: 43.6339793925953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system.
In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z) - BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation [6.5388528484686885]
This study introduces a novel approach towards the creation of medical foundation models.
Our method involves a novel two-stage pretraining approach using vision transformers.
BrainFounder demonstrates a significant performance gain, surpassing the achievements of previous winning solutions.
arXiv Detail & Related papers (2024-06-14T19:49:45Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI [32.40827290083577]
Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system.
Previous approaches primarily employ subject-specific models, sensitive to training sample size.
We propose shallow subject-specific adapters to map cross-subject fMRI data into unified representations.
During training, we leverage both visual and textual supervision for multi-modal brain decoding.
arXiv Detail & Related papers (2024-03-11T01:18:49Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Retinotopy Inspired Brain Encoding Model and the All-for-One Training
Recipe [14.943061215875655]
We pre-trained a brain encoding model using over one million data points from five public datasets spanning three imaging modalities.
We demonstrate the effectiveness of the pre-trained model as a drop-in replacement for commonly used vision backbone models.
arXiv Detail & Related papers (2023-07-26T08:06:40Z) - MTNeuro: A Benchmark for Evaluating Representations of Brain Structure
Across Multiple Levels of Abstraction [0.0]
In brain mapping, learning to automatically parse images to build representations of both small-scale features and global properties is a crucial and open challenge.
Our benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large section of mouse brain.
We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures.
arXiv Detail & Related papers (2023-01-01T04:54:03Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.