Toward a realistic model of speech processing in the brain with
self-supervised learning
- URL: http://arxiv.org/abs/2206.01685v2
- Date: Mon, 20 Mar 2023 10:11:41 GMT
- Title: Toward a realistic model of speech processing in the brain with
self-supervised learning
- Authors: Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec,
Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King
- Abstract summary: Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
- Score: 67.7130239674153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several deep neural networks have recently been shown to generate activations
similar to those of the brain in response to the same input. These algorithms,
however, remain largely implausible: they require (1) extraordinarily large
amounts of data, (2) unobtainable supervised labels, (3) textual rather than
raw sensory input, and / or (4) implausibly large memory (e.g. thousands of
contextual words). These elements highlight the need to identify algorithms
that, under these limitations, would suffice to account for both behavioral and
brain responses. Focusing on the issue of speech processing, we here
hypothesize that self-supervised algorithms trained on the raw waveform
constitute a promising candidate. Specifically, we compare a recent
self-supervised architecture, Wav2Vec 2.0, to the brain activity of 412
English, French, and Mandarin individuals recorded with functional Magnetic
Resonance Imaging (fMRI), while they listened to ~1h of audio books. Our
results are four-fold. First, we show that this algorithm learns brain-like
representations with as little as 600 hours of unlabelled speech -- a quantity
comparable to what infants can be exposed to during language acquisition.
Second, its functional hierarchy aligns with the cortical hierarchy of speech
processing. Third, different training regimes reveal a functional
specialization akin to the cortex: Wav2Vec 2.0 learns sound-generic,
speech-specific and language-specific representations similar to those of the
prefrontal and temporal cortices. Fourth, we confirm the similarity of this
specialization with the behavior of 386 additional participants. These
elements, resulting from the largest neuroimaging benchmark to date, show how
self-supervised learning can account for a rich organization of speech
processing in the brain, and thus delineate a path to identify the laws of
language acquisition which shape the human brain.
Related papers
- Do self-supervised speech and language models extract similar
representations as human brain? [2.390915090736061]
Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception.
We evaluate the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2.
arXiv Detail & Related papers (2023-10-07T01:39:56Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Long-range and hierarchical language predictions in brains and
algorithms [82.81964713263483]
We show that while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions.
This study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.
arXiv Detail & Related papers (2021-11-28T20:26:07Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Inductive biases, pretraining and fine-tuning jointly account for brain
responses to speech [6.87854783185243]
We compare five types of deep neural networks to human brain responses elicited by spoken sentences.
The differences in brain-similarity across networks revealed three main results.
arXiv Detail & Related papers (2021-02-25T19:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.