Related papers: Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM

URL: http://arxiv.org/abs/2405.07840v1
Date: Mon, 13 May 2024 15:25:11 GMT
Title: Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM
Authors: Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, Huiguang He,
Abstract summary: We introduce a novel method, the textbfBrain Prompt GPT (BP-GPT). By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals stimulus into text. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61%$ on METEOR and $2.43%$ on BERTScore.
Score: 19.53589633360839
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decoding language information from brain signals represents a vital research area within brain-computer interfaces, particularly in the context of deciphering the semantic information from the fMRI signal. However, many existing efforts concentrate on decoding small vocabulary sets, leaving space for the exploration of open vocabulary continuous text decoding. In this paper, we introduce a novel method, the \textbf{Brain Prompt GPT (BP-GPT)}. By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. Further, we introduce a text-to-text baseline and align the fMRI prompt to the text prompt. By introducing the text-to-text baseline, our BP-GPT can extract a more robust brain prompt and promote the decoding of pre-trained LLM. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61\%$ on METEOR and $2.43\%$ on BERTScore across all the subjects compared to the state-of-the-art method. The experimental results demonstrate that using brain representation as a prompt to further drive LLM for auditory neural decoding is feasible and effective.

Related papers

Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z)
BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM [19.53589633360839]
We introduce a novel method, the Brain Prompt GPT (BP-GPT) By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals into stimulus text. We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to 4.61 on METEOR and 2.43 on BERTScore.
arXiv Detail & Related papers (2025-02-21T03:13:44Z)
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [29.78480739360263]
We propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction. BrainECHO successively conducts: 1) autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning. BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources.
arXiv Detail & Related papers (2024-10-19T04:29:03Z)
A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals. The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art. A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z)
MAD: Multi-Alignment MEG-to-Text Decoding [21.155031900491654]
We present a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. We achieve an impressive BLEU-1 score on the $textitGWilliams$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric.
arXiv Detail & Related papers (2024-06-03T16:43:10Z)
MindFormer: Semantic Alignment of Multi-Subject fMRI for Brain Decoding [50.55024115943266]
We introduce a novel semantic alignment method of multi-subject fMRI signals using so-called MindFormer. This model is specifically designed to generate fMRI-conditioned feature vectors that can be used for conditioning Stable Diffusion model for fMRI- to-image generation or large language model (LLM) for fMRI-to-text generation. Our experimental results demonstrate that MindFormer generates semantically consistent images and text across different subjects.
arXiv Detail & Related papers (2024-05-28T00:36:25Z)
How Many Bytes Can You Take Out Of Brain-To-Text Decoding? [45.665946951551746]
We propose an information-based evaluation metric for brain-to-text decoders. We show two methods to augment existing state-of-the-art continuous text decoders. We conclude that a practical brain-to-text decoder is likely possible given further algorithmic improvements.
arXiv Detail & Related papers (2024-05-22T22:57:04Z)
Query Augmentation by Decoding Semantics from Brain Signals [61.89860975682576]
We propose Brain-Aug, which enhances a query by incorporating semantic information decoded from brain signals. Experimental results on fMRI datasets show that Brain-Aug produces semantically more accurate queries.
arXiv Detail & Related papers (2024-02-24T04:08:51Z)
Chat2Brain: A Method for Mapping Open-Ended Semantic Queries to Brain Activation Maps [59.648646222905235]
We propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map semantic queries to brain activation maps. We demonstrate that Chat2Brain can synthesize plausible neural activation patterns for more complex tasks of text queries.
arXiv Detail & Related papers (2023-09-10T13:06:45Z)
UniCoRN: Unified Cognitive Signal ReconstructioN bridging cognitive signals and human language [23.623579364849526]
We propose fMRI2text, the first openvocabulary task aiming to bridge fMRI time series and human language. We present a baseline solution, UniCoRN: the Unified Cognitive Signal ReconstructioN for Brain Decoding. Our model achieves a 34.77% BLEU score on fMRI2text, and a 37.04% BLEU when generalized to EEGto-text decoding.
arXiv Detail & Related papers (2023-07-06T05:26:49Z)
Probing Brain Context-Sensitivity with Masked-Attention Generation [87.31930367845125]
We use GPT-2 transformers to generate word embeddings that capture a fixed amount of contextual information. We then tested whether these embeddings could predict fMRI brain activity in humans listening to naturalistic text.
arXiv Detail & Related papers (2023-05-23T09:36:21Z)
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z)
Brain2Word: Decoding Brain Activity for Language Generation [14.24200473508597]
We present a model that can decode fMRI data from unseen subjects. Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy in this challenging task.
arXiv Detail & Related papers (2020-09-10T10:47:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.