Decoding Visual Neural Representations by Multimodal Learning of
Brain-Visual-Linguistic Features
- URL: http://arxiv.org/abs/2210.06756v2
- Date: Thu, 30 Mar 2023 15:27:33 GMT
- Title: Decoding Visual Neural Representations by Multimodal Learning of
Brain-Visual-Linguistic Features
- Authors: Changde Du, Kaicheng Fu, Jinpeng Li, Huiguang He
- Abstract summary: This paper presents a generic neural decoding method called BraVL that uses multimodal learning of brain-visual-linguistic features.
We focus on modeling the relationships between brain, visual and linguistic features via multimodal deep generative models.
In particular, our BraVL model can be trained under various semi-supervised scenarios to incorporate the visual and textual features obtained from the extra categories.
- Score: 9.783560855840602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding human visual neural representations is a challenging task with great
scientific significance in revealing vision-processing mechanisms and
developing brain-like intelligent machines. Most existing methods are difficult
to generalize to novel categories that have no corresponding neural data for
training. The two main reasons are 1) the under-exploitation of the multimodal
semantic knowledge underlying the neural data and 2) the small number of paired
(stimuli-responses) training data. To overcome these limitations, this paper
presents a generic neural decoding method called BraVL that uses multimodal
learning of brain-visual-linguistic features. We focus on modeling the
relationships between brain, visual and linguistic features via multimodal deep
generative models. Specifically, we leverage the mixture-of-product-of-experts
formulation to infer a latent code that enables a coherent joint generation of
all three modalities. To learn a more consistent joint representation and
improve the data efficiency in the case of limited brain activity data, we
exploit both intra- and inter-modality mutual information maximization
regularization terms. In particular, our BraVL model can be trained under
various semi-supervised scenarios to incorporate the visual and textual
features obtained from the extra categories. Finally, we construct three
trimodal matching datasets, and the extensive experiments lead to some
interesting conclusions and cognitive insights: 1) decoding novel visual
categories from human brain activity is practically possible with good
accuracy; 2) decoding models using the combination of visual and linguistic
features perform much better than those using either of them alone; 3) visual
perception may be accompanied by linguistic influences to represent the
semantics of visual stimuli. Code and data: https://github.com/ChangdeDu/BraVL.
Related papers
- Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction [8.63068449082585]
Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition.
Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D.
We have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development.
arXiv Detail & Related papers (2024-04-30T10:41:23Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing [72.45257414889478]
We aim to reduce human workload by predicting connectivity between over-segmented neuron pieces.
We first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain.
We propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding.
arXiv Detail & Related papers (2024-01-05T19:45:12Z) - Investigating the Encoding of Words in BERT's Neurons using Feature
Textualization [11.943486282441143]
We propose a technique to produce representations of neurons in embedding word space.
We find that the produced representations can provide insights about the encoded knowledge in individual neurons.
arXiv Detail & Related papers (2023-11-14T15:21:49Z) - Retinotopy Inspired Brain Encoding Model and the All-for-One Training
Recipe [14.943061215875655]
We pre-trained a brain encoding model using over one million data points from five public datasets spanning three imaging modalities.
We demonstrate the effectiveness of the pre-trained model as a drop-in replacement for commonly used vision backbone models.
arXiv Detail & Related papers (2023-07-26T08:06:40Z) - SNeL: A Structured Neuro-Symbolic Language for Entity-Based Multimodal
Scene Understanding [0.0]
We introduce SNeL (Structured Neuro-symbolic Language), a versatile query language designed to facilitate nuanced interactions with neural networks processing multimodal data.
SNeL's expressive interface enables the construction of intricate queries, supporting logical and arithmetic operators, comparators, nesting, and more.
Our evaluations demonstrate SNeL's potential to reshape the way we interact with complex neural networks.
arXiv Detail & Related papers (2023-06-09T17:01:51Z) - Multimodal foundation models are better simulators of the human brain [65.10501322822881]
We present a newly-designed multimodal foundation model pre-trained on 15 million image-text pairs.
We find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
arXiv Detail & Related papers (2022-08-17T12:36:26Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - Compositional Explanations of Neurons [52.71742655312625]
We describe a procedure for explaining neurons in deep representations by identifying compositional logical concepts.
We use this procedure to answer several questions on interpretability in models for vision and natural language processing.
arXiv Detail & Related papers (2020-06-24T20:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.