LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
- URL: http://arxiv.org/abs/2506.02098v1
- Date: Mon, 02 Jun 2025 17:59:41 GMT
- Title: LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale
- Authors: Miran Ă–zdogan, Gilad Landau, Gereon Elvers, Dulhan Jayalath, Pratik Somaiya, Francesco Mantegna, Mark Woolrich, Oiwi Parker Jones,
- Abstract summary: LibriBrain is the largest single-subject MEG dataset to date for speech decoding.<n>This unprecedented depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods.
- Score: 2.225053366951265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings -- 5$\times$ larger than the next comparable dataset and 50$\times$ larger than most. This unprecedented `depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline experiments demonstrate that increasing training data yields substantial improvements in decoding performance, highlighting the value of scaling up deep, within-subject datasets. By releasing this dataset, we aim to empower the research community to advance speech decoding methodologies and accelerate the development of safe, effective clinical brain-computer interfaces.
Related papers
- Decoding individual words from non-invasive brain recordings across 723 participants [9.9068852821927]
We introduce a novel deep learning pipeline to decode individual words from non-invasive electro- (EEG) and magneto-encephalography (MEG) signals.<n>We train and evaluate our approach on an unprecedentedly large number of participants exposed to five million words either written or spoken in English, French or Dutch.
arXiv Detail & Related papers (2024-12-11T15:53:49Z) - The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning [3.649801602551928]
We develop self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings.<n>Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects.<n>It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Deep Learning for real-time neural decoding of grasp [0.0]
We present a Deep Learning-based approach to the decoding of neural signals for grasp type classification.
The main goal of the presented approach is to improve over state-of-the-art decoding accuracy without relying on any prior neuroscience knowledge.
arXiv Detail & Related papers (2023-11-02T08:26:29Z) - On decoder-only architecture for speech-to-text and large language model
integration [59.49886892602309]
Speech-LLaMA is a novel approach that effectively incorporates acoustic information into text-based large language models.
We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines.
arXiv Detail & Related papers (2023-07-08T06:47:58Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - MEG-MASC: a high-quality magneto-encephalography dataset for evaluating
natural speech processing [1.345669927504424]
The "MEG-MASC" dataset provides a curated set of raw magnetoencephalography (MEG) recordings of 27 English speakers.
We time-stamp the onset and offset of each word and phoneme in the metadata of the recording, and organize the dataset according to the 'Brain Imaging Data Structure' (BIDS)
This data collection provides a suitable benchmark to large-scale encoding and decoding analyses of temporally-resolved brain responses to speech.
arXiv Detail & Related papers (2022-07-26T19:17:01Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - SpeechBrain: A General-Purpose Speech Toolkit [73.0404642815335]
SpeechBrain is an open-source and all-in-one speech toolkit.
It is designed to facilitate the research and development of neural speech processing technologies.
It achieves competitive or state-of-the-art performance in a wide range of speech benchmarks.
arXiv Detail & Related papers (2021-06-08T18:22:56Z) - TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval [103.85002875155551]
We propose a novel generalized distillation method, TeachText, for exploiting large-scale language pretraining.
We extend our method to video side modalities and show that we can effectively reduce the number of used modalities at test time.
Our approach advances the state of the art on several video retrieval benchmarks by a significant margin and adds no computational overhead at test time.
arXiv Detail & Related papers (2021-04-16T17:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.