Multimodal foundation models are better simulators of the human brain
- URL: http://arxiv.org/abs/2208.08263v1
- Date: Wed, 17 Aug 2022 12:36:26 GMT
- Title: Multimodal foundation models are better simulators of the human brain
- Authors: Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan
Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen
- Abstract summary: We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
- Score: 65.10501322822881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning, especially large-scale multimodal pre-training, has
developed rapidly over the past few years and led to the greatest advances in
artificial intelligence (AI). Despite its effectiveness, understanding the
underlying mechanism of multimodal pre-training models still remains a grand
challenge. Revealing the explainability of such models is likely to enable
breakthroughs of novel learning paradigms in the AI field. To this end, given
the multimodal nature of the human brain, we propose to explore the
explainability of multimodal learning models with the aid of non-invasive brain
imaging technologies such as functional magnetic resonance imaging (fMRI).
Concretely, we first present a newly-designed multimodal foundation model
pre-trained on 15 million image-text pairs, which has shown strong multimodal
understanding and generalization abilities in a variety of cognitive downstream
tasks. Further, from the perspective of neural encoding (based on our
foundation model), we find that both visual and lingual encoders trained
multimodally are more brain-like compared with unimodal ones. Particularly, we
identify a number of brain regions where multimodally-trained encoders
demonstrate better neural encoding performance. This is consistent with the
findings in existing studies on exploring brain multi-sensory integration.
Therefore, we believe that multimodal foundation models are more suitable tools
for neuroscientists to study the multimodal signal processing mechanisms in the
human brain. Our findings also demonstrate the potential of multimodal
foundation models as ideal computational simulators to promote both
AI-for-brain and brain-for-AI research.
Related papers
- Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Automated Ensemble Multimodal Machine Learning for Healthcare [52.500923923797835]
We introduce a multimodal framework, AutoPrognosis-M, that enables the integration of structured clinical (tabular) data and medical imaging using automated machine learning.
AutoPrognosis-M incorporates 17 imaging models, including convolutional neural networks and vision transformers, and three distinct multimodal fusion strategies.
arXiv Detail & Related papers (2024-07-25T17:46:38Z) - Revealing Vision-Language Integration in the Brain with Multimodal Networks [21.88969136189006]
We use (multi) deep neural networks (DNNs) to probe for sites of multimodal integration in the human brain by predicting stereoencephalography (SEEG) recordings taken while human subjects watched movies.
We operationalize sites of multimodal integration as regions where a multimodal vision-language model predicts recordings better than unimodal language, unimodal vision, or linearly-integrated language-vision models.
arXiv Detail & Related papers (2024-06-20T16:43:22Z) - Foundations of Multisensory Artificial Intelligence [32.56967614091527]
This thesis aims to advance the machine learning foundations of multisensory AI.
In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task.
In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks.
arXiv Detail & Related papers (2024-04-29T14:45:28Z) - Vision-Language Integration in Multimodal Video Transformers (Partially)
Aligns with the Brain [5.496000639803771]
We present a promising approach for probing a pre-trained multimodal video transformer model by leveraging neuroscientific evidence of multimodal information processing in the brain.
We find evidence that vision enhances masked prediction performance during language processing, providing support that cross-modal representations in models can benefit individual modalities.
We show that the brain alignment of the pre-trained joint representation can be improved by fine-tuning using a task that requires vision-language inferences.
arXiv Detail & Related papers (2023-11-13T21:32:37Z) - Incomplete Multimodal Learning for Complex Brain Disorders Prediction [65.95783479249745]
We propose a new incomplete multimodal data integration approach that employs transformers and generative adversarial networks.
We apply our new method to predict cognitive degeneration and disease outcomes using the multimodal imaging genetic data from Alzheimer's Disease Neuroimaging Initiative cohort.
arXiv Detail & Related papers (2023-05-25T16:29:16Z) - Self-supervised multimodal neuroimaging yields predictive
representations for a spectrum of Alzheimer's phenotypes [27.331511924585023]
This work presents a novel multi-scale coordinated framework for learning multiple representations from multimodal neuroimaging data.
We propose a general taxonomy of informative inductive biases to capture unique and joint information in multimodal self-supervised fusion.
We show that self-supervised models reveal disorder-relevant brain regions and multimodal links without access to the labels during pre-training.
arXiv Detail & Related papers (2022-09-07T01:37:19Z) - Brainish: Formalizing A Multimodal Language for Intelligence and
Consciousness [23.86633372513335]
We describe the desiderata of a multimodal language called Brainish.
Brainish consists of words, images, audio, and sensations combined in representations that the Conscious Turing Machine's processors use to communicate.
arXiv Detail & Related papers (2022-04-14T00:35:52Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.