Towards Decoding Brain Activity During Passive Listening of Speech
- URL: http://arxiv.org/abs/2402.16996v1
- Date: Mon, 26 Feb 2024 20:04:01 GMT
- Title: Towards Decoding Brain Activity During Passive Listening of Speech
- Authors: Mil\'an Andr\'as Fodor and Tam\'as G\'abor Csap\'o and Frigyes Viktor
Arthur
- Abstract summary: We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods.
This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech.
Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of the study is to investigate the complex mechanisms of speech
perception and ultimately decode the electrical changes in the brain accruing
while listening to speech. We attempt to decode heard speech from intracranial
electroencephalographic (iEEG) data using deep learning methods. The goal is to
aid the advancement of brain-computer interface (BCI) technology for speech
synthesis, and, hopefully, to provide an additional perspective on the
cognitive processes of speech perception. This approach diverges from the
conventional focus on speech production and instead chooses to investigate
neural representations of perceived speech. This angle opened up a complex
perspective, potentially allowing us to study more sophisticated neural
patterns. Leveraging the power of deep learning models, the research aimed to
establish a connection between these intricate neural activities and the
corresponding speech sounds. Despite the approach not having achieved a
breakthrough yet, the research sheds light on the potential of decoding neural
activity during speech perception. Our current efforts can serve as a
foundation, and we are optimistic about the potential of expanding and
improving upon this work to move closer towards more advanced BCIs, better
understanding of processes underlying perceived speech and its relation to
spoken speech.
Related papers
- Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Neural Speech Embeddings for Speech Synthesis Based on Deep Generative
Networks [27.64740032872726]
We introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals.
Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech.
arXiv Detail & Related papers (2023-12-10T08:12:08Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Deep Learning for Visual Speech Analysis: A Survey [54.53032361204449]
This paper presents a review of recent progress in deep learning methods on visual speech analysis.
We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance.
arXiv Detail & Related papers (2022-05-22T14:44:53Z) - Long-range and hierarchical language predictions in brains and
algorithms [82.81964713263483]
We show that while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions.
This study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.
arXiv Detail & Related papers (2021-11-28T20:26:07Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Inductive biases, pretraining and fine-tuning jointly account for brain
responses to speech [6.87854783185243]
We compare five types of deep neural networks to human brain responses elicited by spoken sentences.
The differences in brain-similarity across networks revealed three main results.
arXiv Detail & Related papers (2021-02-25T19:11:55Z) - Bio-Inspired Modality Fusion for Active Speaker Detection [1.0644456464343592]
This paper presents a methodology for fusing correlated auditory and visual information for active speaker detection.
The ability can have a wide range of applications, from teleconferencing systems to social robotics.
arXiv Detail & Related papers (2020-02-28T20:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.