MindGPT: Interpreting What You See with Non-invasive Brain Recordings
- URL: http://arxiv.org/abs/2309.15729v1
- Date: Wed, 27 Sep 2023 15:35:20 GMT
- Title: MindGPT: Interpreting What You See with Non-invasive Brain Recordings
- Authors: Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan
- Abstract summary: We introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals.
Our experiments show that the generated word sequences truthfully represented the visual information conveyed in the seen stimuli.
- Score: 24.63828455553959
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding of seen visual contents with non-invasive brain recordings has
important scientific and practical values. Efforts have been made to recover
the seen images from brain signals. However, most existing approaches cannot
faithfully reflect the visual contents due to insufficient image quality or
semantic mismatches. Compared with reconstructing pixel-level visual images,
speaking is a more efficient and effective way to explain visual information.
Here we introduce a non-invasive neural decoder, termed as MindGPT, which
interprets perceived visual stimuli into natural languages from fMRI signals.
Specifically, our model builds upon a visually guided neural encoder with a
cross-attention mechanism, which permits us to guide latent neural
representations towards a desired language semantic direction in an end-to-end
manner by the collaborative use of the large language model GPT. By doing so,
we found that the neural representations of the MindGPT are explainable, which
can be used to evaluate the contributions of visual properties to language
semantics. Our experiments show that the generated word sequences truthfully
represented the visual information (with essential details) conveyed in the
seen stimuli. The results also suggested that with respect to language decoding
tasks, the higher visual cortex (HVC) is more semantically informative than the
lower visual cortex (LVC), and using only the HVC can recover most of the
semantic information. The code of the MindGPT model will be publicly available
at https://github.com/JxuanC/MindGPT.
Related papers
- VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap [52.497823009176074]
We perform an in-depth analysis of hallucinations and discover several novel insights about how and when LVLMs hallucinate.
To overcome this shortcoming, we propose Visual Description Grounded Decoding (VDGD), a simple, robust, and training-free method for alleviating hallucinations.
arXiv Detail & Related papers (2024-05-24T16:21:59Z) - Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain [0.0]
We take inspiration from neuroscience to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction.
We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives.
We show that semantic encoding is a key factor in aligning AI with human visual perception, while saliency suppression is a non-brain-like strategy.
arXiv Detail & Related papers (2024-04-29T15:05:42Z) - Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images.
Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z) - DeViL: Decoding Vision features into Language [53.88202366696955]
Post-hoc explanation methods have often been criticised for abstracting away the decision-making process of deep neural networks.
In this work, we would like to provide natural language descriptions for what different layers of a vision backbone have learned.
We train a transformer network to translate individual image features of any vision layer into a prompt that a separate off-the-shelf language model decodes into natural language.
arXiv Detail & Related papers (2023-09-04T13:59:55Z) - Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text.
We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - Neural encoding with visual attention [17.020869686284165]
We propose a novel approach to neural encoding by including a trainable soft-attention module.
We find that attention locations estimated by the model on independent data agree well with the corresponding eye fixation patterns.
arXiv Detail & Related papers (2020-10-01T16:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.