The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
- URL: http://arxiv.org/abs/2406.04328v3
- Date: Tue, 08 Oct 2024 17:59:23 GMT
- Title: The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning
- Authors: Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones,
- Abstract summary: We develop a set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous recordings.
Results show that representations learned with these objectives scale with data, generalise across subjects, datasets, and tasks, and surpass comparable self-supervised approaches.
- Score: 3.649801602551928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The past few years have produced a series of spectacular advances in the decoding of speech from brain activity. The engine of these advances has been the acquisition of labelled data, with increasingly large datasets acquired from single subjects. However, participants exhibit individual differences, such as anatomy, and datasets use varied scanners and task designs. As a result, prior work has struggled to leverage data from multiple subjects, multiple datasets, multiple tasks, and unlabelled datasets. In turn, the field has not benefited from the rapidly growing number of open neural data repositories to exploit large-scale data and deep learning. This gap exists for all neural data, but especially for magnetoencephalography (MEG), where the scale of individual datasets has not yet caught up with other modalities. To address this, we develop a set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous and unlabelled neural recordings. Experimental results with MEG show that representations learned with these objectives scale with data, generalise across subjects, datasets, and tasks, outperform using the raw input representation, and even surpass comparable self-supervised approaches. In addition, we set new benchmarks for two foundational speech decoding tasks. Collectively, these methods now unlock the potential for training speech decoding models with orders of magnitude more existing data.
Related papers
- Neural decoding from stereotactic EEG: accounting for electrode variability across subjects [21.28778005847666]
We introduce seegnificant: a training framework that can be used to decode behavior across subjects using sEEG data.
We construct a multi-subject model trained on the combined data from 21 subjects performing a behavioral task.
arXiv Detail & Related papers (2024-11-01T17:58:01Z) - Resolving Domain Shift For Representations Of Speech In Non-Invasive Brain Recordings [3.5297361401370044]
We focus on non-invasive data collected using magnetoencephalography (MEG)
To the best of our knowledge, this study is the first ever application of feature-level, deep learning based on MEG neuroimaging data.
arXiv Detail & Related papers (2024-10-25T21:56:23Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation [6.5388528484686885]
This study introduces a novel approach towards the creation of medical foundation models.
Our method involves a novel two-stage pretraining approach using vision transformers.
BrainFounder demonstrates a significant performance gain, surpassing the achievements of previous winning solutions.
arXiv Detail & Related papers (2024-06-14T19:49:45Z) - Predicting Infant Brain Connectivity with Federated Multi-Trajectory
GNNs using Scarce Data [54.55126643084341]
Existing deep learning solutions suffer from three major limitations.
We introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network.
Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets.
arXiv Detail & Related papers (2024-01-01T10:20:01Z) - Aligning brain functions boosts the decoding of visual semantics in
novel subjects [3.226564454654026]
We propose to boost brain decoding by aligning brain responses to videos and static images across subjects.
Our method improves out-of-subject decoding performance by up to 75%.
It also outperforms classical single-subject approaches when fewer than 100 minutes of data is available for the tested subject.
arXiv Detail & Related papers (2023-12-11T15:55:20Z) - Deep Learning for real-time neural decoding of grasp [0.0]
We present a Deep Learning-based approach to the decoding of neural signals for grasp type classification.
The main goal of the presented approach is to improve over state-of-the-art decoding accuracy without relying on any prior neuroscience knowledge.
arXiv Detail & Related papers (2023-11-02T08:26:29Z) - A Unified, Scalable Framework for Neural Population Decoding [12.052847252465826]
We introduce a training framework and architecture designed to model the population dynamics of neural activity.
We construct a large-scale multi-session model trained on large datasets from seven nonhuman primates.
arXiv Detail & Related papers (2023-10-24T17:58:26Z) - Fighting the scanner effect in brain MRI segmentation with a progressive
level-of-detail network trained on multi-site data [1.6379393441314491]
LOD-Brain is a 3D convolutional neural network with progressive levels-of-detail able to segment brain data from any site.
It produces state-of-the-art results, with no significant difference in performance between internal and external sites.
Its portability opens the way for large scale application across different healthcare institutions, patient populations, and imaging technology manufacturers.
arXiv Detail & Related papers (2022-11-04T12:15:18Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - 2021 BEETL Competition: Advancing Transfer Learning for Subject
Independence & Heterogenous EEG Data Sets [89.84774119537087]
We design two transfer learning challenges around diagnostics and Brain-Computer-Interfacing (BCI)
Task 1 is centred on medical diagnostics, addressing automatic sleep stage annotation across subjects.
Task 2 is centred on Brain-Computer Interfacing (BCI), addressing motor imagery decoding across both subjects and data sets.
arXiv Detail & Related papers (2022-02-14T12:12:20Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z) - Surgical Mask Detection with Convolutional Neural Networks and Data
Augmentations on Spectrograms [8.747840760772268]
We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice.
Results show that most of the baselines given by ComParE are outperformed.
arXiv Detail & Related papers (2020-08-11T09:02:47Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Neural Data-to-Text Generation via Jointly Learning the Segmentation and
Correspondence [48.765579605145454]
We propose to explicitly segment target text into fragment units and align them with their data correspondences.
The resulting architecture maintains the same expressive power as neural attention models.
On both E2E and WebNLG benchmarks, we show the proposed model consistently outperforms its neural attention counterparts.
arXiv Detail & Related papers (2020-05-03T14:28:28Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.