Decoding speech perception from non-invasive brain recordings
- URL: http://arxiv.org/abs/2208.12266v2
- Date: Thu, 5 Oct 2023 15:54:11 GMT
- Title: Decoding speech perception from non-invasive brain recordings
- Authors: Alexandre D\'efossez, Charlotte Caucheteux, J\'er\'emy Rapin, Ori
Kabeli, Jean-R\'emi King
- Abstract summary: We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
- Score: 48.46819575538446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decoding speech from brain activity is a long-awaited goal in both healthcare
and neuroscience. Invasive devices have recently led to major milestones in
that regard: deep learning algorithms trained on intracranial recordings now
start to decode elementary linguistic features (e.g. letters, words,
spectrograms). However, extending this approach to natural speech and
non-invasive brain recordings remains a major challenge. Here, we introduce a
model trained with contrastive-learning to decode self-supervised
representations of perceived speech from the non-invasive recordings of a large
cohort of healthy individuals. To evaluate this approach, we curate and
integrate four public datasets, encompassing 175 volunteers recorded with
magneto- or electro-encephalography (M/EEG), while they listened to short
stories and isolated sentences. The results show that our model can identify,
from 3 seconds of MEG signals, the corresponding speech segment with up to 41%
accuracy out of more than 1,000 distinct possibilities on average across
participants, and more than 80% in the very best participants - a performance
that allows the decoding of words and phrases absent from the training set. The
comparison of our model to a variety of baselines highlights the importance of
(i) a contrastive objective, (ii) pretrained representations of speech and
(iii) a common convolutional architecture simultaneously trained across
multiple participants. Finally, the analysis of the decoder's predictions
suggests that they primarily depend on lexical and contextual semantic
representations. Overall, this effective decoding of perceived speech from
non-invasive recordings delineates a promising path to decode language from
brain activity, without putting patients at risk for brain surgery.
Related papers
- Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals [1.33134751838052]
This research investigated the effectiveness of deep learning models for non-invasive neural signal decoding.
It focused on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech.
arXiv Detail & Related papers (2024-11-14T07:20:08Z) - Decoding Continuous Character-based Language from Non-invasive Brain Recordings [33.11373366800627]
We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings.
A character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures.
The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces.
arXiv Detail & Related papers (2024-03-17T12:12:33Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Jointly Learning Visual and Auditory Speech Representations from Raw
Data [108.68531445641769]
RAVEn is a self-supervised multi-modal approach to jointly learn visual and auditory speech representations.
Our design is asymmetric w.r.t. driven by the inherent differences between video and audio.
RAVEn surpasses all self-supervised methods on visual speech recognition.
arXiv Detail & Related papers (2022-12-12T21:04:06Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.