MindGPT: Interpreting What You See with Non-invasive Brain Recordings
- URL: http://arxiv.org/abs/2309.15729v1
- Date: Wed, 27 Sep 2023 15:35:20 GMT
- Title: MindGPT: Interpreting What You See with Non-invasive Brain Recordings
- Authors: Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan
- Abstract summary: We introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals.
Our experiments show that the generated word sequences truthfully represented the visual information conveyed in the seen stimuli.
- Score: 24.63828455553959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding of seen visual contents with non-invasive brain recordings has
important scientific and practical values. Efforts have been made to recover
the seen images from brain signals. However, most existing approaches cannot
faithfully reflect the visual contents due to insufficient image quality or
semantic mismatches. Compared with reconstructing pixel-level visual images,
speaking is a more efficient and effective way to explain visual information.
Here we introduce a non-invasive neural decoder, termed as MindGPT, which
interprets perceived visual stimuli into natural languages from fMRI signals.
Specifically, our model builds upon a visually guided neural encoder with a
cross-attention mechanism, which permits us to guide latent neural
representations towards a desired language semantic direction in an end-to-end
manner by the collaborative use of the large language model GPT. By doing so,
we found that the neural representations of the MindGPT are explainable, which
can be used to evaluate the contributions of visual properties to language
semantics. Our experiments show that the generated word sequences truthfully
represented the visual information (with essential details) conveyed in the
seen stimuli. The results also suggested that with respect to language decoding
tasks, the higher visual cortex (HVC) is more semantically informative than the
lower visual cortex (LVC), and using only the HVC can recover most of the
semantic information. The code of the MindGPT model will be publicly available
at https://github.com/JxuanC/MindGPT.
Related papers
- Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics [8.749640179057469]
We use linear decoding over the learned representations of unimodal vision, unimodal language, and multimodal deep neural network (DNN) models to predict human beauty ratings of naturalistic images.
We show that unimodal vision models (e.g. SimCLR) account for the vast majority of explainable variance in these ratings. Language-aligned vision models (e.g. SLIP) yield small gains relative to unimodal vision.
Taken together, these results suggest that whatever words we may eventually find to describe our experience of beauty, the ineffable computations of feedforward perception may provide sufficient foundation for that experience.
arXiv Detail & Related papers (2024-10-31T03:37:21Z) - BrainDecoder: Style-Based Visual Decoding of EEG Signals [2.1611379205310506]
Decoding neural representations of visual stimuli from electroencephalography (EEG) offers valuable insights into brain activity and cognition.
Recent advancements in deep learning have significantly enhanced the field of visual decoding of EEG.
We present a novel visual decoding pipeline that emphasizes the reconstruction of the style, such as color and texture, of images viewed by the subject.
arXiv Detail & Related papers (2024-09-09T02:14:23Z) - Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs [52.497823009176074]
Large Vision-Language Models (LVLMs) often produce responses that misalign with factual information, a phenomenon known as hallucinations.
We introduce Visual Description Grounded Decoding (VDGD), a training-free method designed to enhance visual perception and improve reasoning capabilities in LVLMs.
arXiv Detail & Related papers (2024-05-24T16:21:59Z) - Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain [0.0]
We take inspiration from neuroscience to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction.
We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives.
We show that semantic encoding is a key factor in aligning AI with human visual perception, while saliency suppression is a non-brain-like strategy.
arXiv Detail & Related papers (2024-04-29T15:05:42Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - DeViL: Decoding Vision features into Language [53.88202366696955]
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks.
In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned.
We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language.
arXiv Detail & Related papers (2023-09-04T13:59:55Z) - Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text.
We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - Neural encoding with visual attention [17.020869686284165]
We propose a novel approach to neural encoding by including a trainable soft-attention module.
We find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns.
arXiv Detail & Related papers (2020-10-01T16:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.