How Many Bytes Can You Take Out Of Brain-To-Text Decoding?
- URL: http://arxiv.org/abs/2405.14055v1
- Date: Wed, 22 May 2024 22:57:04 GMT
- Title: How Many Bytes Can You Take Out Of Brain-To-Text Decoding?
- Authors: Richard Antonello, Nihita Sarma, Jerry Tang, Jiaru Song, Alexander Huth,
- Abstract summary: We propose an information-based evaluation metric for brain-to-text decoders.
We show two methods to augment existing state-of-the-art continuous text decoders.
We conclude that a practical brain-to-text decoder is likely possible given further algorithmic improvements.
- Score: 45.665946951551746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Brain-computer interfaces have promising medical and scientific applications for aiding speech and studying the brain. In this work, we propose an information-based evaluation metric for brain-to-text decoders. Using this metric, we examine two methods to augment existing state-of-the-art continuous text decoders. We show that these methods, in concert, can improve brain decoding performance by upwards of 40% when compared to a baseline model. We further examine the informatic properties of brain-to-text decoders and show empirically that they have Zipfian power law dynamics. Finally, we provide an estimate for the idealized performance of an fMRI-based text decoder. We compare this idealized model to our current model, and use our information-based metric to quantify the main sources of decoding error. We conclude that a practical brain-to-text decoder is likely possible given further algorithmic improvements.
Related papers
- A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals.
The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art.
A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z) - Language Reconstruction with Brain Predictive Coding from fMRI Data [28.217967547268216]
Theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations.
textscPredFT achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8%$.
arXiv Detail & Related papers (2024-05-19T16:06:02Z) - Modality-Agnostic fMRI Decoding of Vision and Language [4.837421245886033]
We introduce and use a new large-scale fMRI dataset (8,500 trials per subject) of people watching both images and text descriptions.
This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing.
We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models.
arXiv Detail & Related papers (2024-03-18T13:30:03Z) - Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey) [9.14580723964253]
Can we obtain insights about the brain using AI models?
How is the information in deep learning models related to brain recordings?
Decoding models solve the inverse problem of reconstructing stimuli given the fMRI.
Inspired by the effectiveness of deep learning models for natural language processing, computer vision, and speech, several neural encoding and decoding models have been recently proposed.
arXiv Detail & Related papers (2023-07-17T06:54:36Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - Adversarial Neural Networks for Error Correcting Codes [76.70040964453638]
We introduce a general framework to boost the performance and applicability of machine learning (ML) models.
We propose to combine ML decoders with a competing discriminator network that tries to distinguish between codewords and noisy words.
Our framework is game-theoretic, motivated by generative adversarial networks (GANs)
arXiv Detail & Related papers (2021-12-21T19:14:44Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - Brain2Word: Decoding Brain Activity for Language Generation [14.24200473508597]
We present a model that can decode fMRI data from unseen subjects.
Our model achieves 5.22% Top-1 and 13.59% Top-5 accuracy in this challenging task.
arXiv Detail & Related papers (2020-09-10T10:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.