Related papers: Brain-to-Text Benchmark '24: Lessons Learned

Brain-to-Text Benchmark '24: Lessons Learned

URL: http://arxiv.org/abs/2412.17227v1
Date: Mon, 23 Dec 2024 02:44:35 GMT
Title: Brain-to-Text Benchmark '24: Lessons Learned
Authors: Francis R. Willett, Jingyuan Li, Trung Le, Chaofei Fan, Mingfei Chen, Eli Shlizerman, Yue Chen, Xin Zheng, Tatsuo S. Okubo, Tyler Benster, Hyun Dong Lee, Maxwell Kounga, E. Kelly Buchanan, David Zoltowski, Scott W. Linderman, Jaimie M. Henderson,
Abstract summary: Speech brain-computer interfaces aim to decipher what a person is trying to say from neural activity alone.<n>The Brain-to-Text Benchmark '24 foster the advancement of decoding algorithms that convert neural activity to text.<n>The benchmark will remain open indefinitely to support further work towards increasing the accuracy of brain-to-text algorithms.
Score: 30.41641771704316
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech brain-computer interfaces aim to decipher what a person is trying to say from neural activity alone, restoring communication to people with paralysis who have lost the ability to speak intelligibly. The Brain-to-Text Benchmark '24 and associated competition was created to foster the advancement of decoding algorithms that convert neural activity to text. Here, we summarize the lessons learned from the competition ending on June 1, 2024 (the top 4 entrants also presented their experiences in a recorded webinar). The largest improvements in accuracy were achieved using an ensembling approach, where the output of multiple independent decoders was merged using a fine-tuned large language model (an approach used by all 3 top entrants). Performance gains were also found by improving how the baseline recurrent neural network (RNN) model was trained, including by optimizing learning rate scheduling and by using a diphone training objective. Improving upon the model architecture itself proved more difficult, however, with attempts to use deep state space models or transformers not yet appearing to offer a benefit over the RNN baseline. The benchmark will remain open indefinitely to support further work towards increasing the accuracy of brain-to-text algorithms.

Related papers

The 2025 PNPL Competition: Speech Detection and Phoneme Classification in the LibriBrain Dataset [10.214825301231025]
Speech decoding from non-invasive brain data holds potential for profound societal impact.<n>The ultimate aim of the 2025 PNPL competition is to produce the conditions for an "ImageNet moment"<n>We present the largest within-subject MEG dataset recorded to date (LibriBrain) together with a user-friendly Python library (pnpl)<n>The competition features a Standard track that emphasises algorithmic innovation, as well as an Extended track that is expected to reward larger-scale computing.
arXiv Detail & Related papers (2025-06-11T20:34:33Z)
Language Reconstruction with Brain Predictive Coding from fMRI Data [28.217967547268216]
Theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations. textscPredFT achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8%$.
arXiv Detail & Related papers (2024-05-19T16:06:02Z)
Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework. Our model reveals that most of the connections become deterministic after learning. Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z)
Employing Hybrid Deep Neural Networks on Dari Speech [0.0]
This article focuses on the recognition of individual words in the Dari language using the Mel-frequency cepstral coefficients (MFCCs) feature extraction method. We evaluate three different deep neural network models: Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Multilayer Perceptron (MLP)
arXiv Detail & Related papers (2023-05-04T23:10:53Z)
BrainBERT: Self-supervised representation learning for intracranial recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience. Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data. In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z)
Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings. Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z)
Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words. We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z)
On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data. We obtain word-level confidence scores by utilizing several types of features calculated during decoding. The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z)
Human Sentence Processing: Recurrence or Attention? [3.834032293147498]
Recently introduced Transformer architecture outperforms RNNs on many natural language processing tasks. We compare Transformer- and RNN-based language models' ability to account for measures of human reading effort.
arXiv Detail & Related papers (2020-05-19T14:17:49Z)
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.