Decoding Phone Pairs from MEG Signals Across Speech Modalities
- URL: http://arxiv.org/abs/2505.15355v1
- Date: Wed, 21 May 2025 10:31:34 GMT
- Title: Decoding Phone Pairs from MEG Signals Across Speech Modalities
- Authors: Xabier de Zuazo, Eva Navas, Ibon Saratxaga, Mathieu Bourguignon, Nicola Molinaro,
- Abstract summary: We investigated magnetoencephalography signals to decode phones from brain activity during speech production and perception tasks.<n>Our results demonstrate significantly higher decoding accuracy during speech production compared to passive listening and playback modalities.
- Score: 0.4054486015338004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the neural mechanisms underlying speech production is essential for both advancing cognitive neuroscience theory and developing practical communication technologies. In this study, we investigated magnetoencephalography signals to decode phones from brain activity during speech production and perception (passive listening and voice playback) tasks. Using a dataset comprising 17 participants, we performed pairwise phone classification, extending our analysis to 15 phonetic pairs. Multiple machine learning approaches, including regularized linear models and neural network architectures, were compared to determine their effectiveness in decoding phonetic information. Our results demonstrate significantly higher decoding accuracy during speech production (76.6%) compared to passive listening and playback modalities (~51%), emphasizing the richer neural information available during overt speech. Among the models, the Elastic Net classifier consistently outperformed more complex neural networks, highlighting the effectiveness of traditional regularization techniques when applied to limited and high-dimensional MEG datasets. Besides, analysis of specific brain frequency bands revealed that low-frequency oscillations, particularly Delta (0.2-3 Hz) and Theta (4-7 Hz), contributed the most substantially to decoding accuracy, suggesting that these bands encode critical speech production-related neural processes. Despite using advanced denoising methods, it remains unclear whether decoding solely reflects neural activity or if residual muscular or movement artifacts also contributed, indicating the need for further methodological refinement. Overall, our findings underline the critical importance of examining overt speech production paradigms, which, despite their complexity, offer opportunities to improve brain-computer interfaces to help individuals with severe speech impairments.
Related papers
- BrainStratify: Coarse-to-Fine Disentanglement of Intracranial Neural Dynamics [8.36470471250669]
Decoding speech directly from neural activity is a central goal in brain-computer interface (BCI) research.<n>In recent years, exciting advances have been made through the growing use of intracranial field potential recordings, such as stereo-ElectroEncephaloGraphy (sEEG) and ElectroCorticoGraphy (ECoG)<n>These neural signals capture rich population-level activity but present key challenges: (i) task-relevant neural signals are sparsely distributed across sEEG electrodes, and (ii) they are often entangled with task-irrelevant neural signals in both sEEG and ECo
arXiv Detail & Related papers (2025-05-26T19:36:39Z) - neuro2voc: Decoding Vocalizations from Neural Activity [3.1913357260723303]
This master project investigates experimental methods for decoding zebra finch motor outputs.<n>XGBoost with SHAP analysis trained on spike rates revealed neuronal interaction patterns crucial for syllable classification.<n>A combined contrastive learning-VAE framework successfully generated spectrograms from binned neural data.
arXiv Detail & Related papers (2025-02-02T11:09:31Z) - On Creating A Brain-To-Text Decoder [6.084958172018792]
This paper centers on the application of raw electroencephalogram (EEG) signals for decoding human brain activity.<n>The investigation specifically scrutinizes the efficacy of brain-computer interfaces (BCI) in deciphering neural signals associated with speech production.
arXiv Detail & Related papers (2025-01-10T20:04:54Z) - Bridging Auditory Perception and Language Comprehension through MEG-Driven Encoding Models [0.12289361708127873]
We use Magnetoencephalography (MEG) data to analyze brain responses to spoken language stimuli.<n>We develop two distinct encoding models: an audio-to-MEG encoder, and a text-to-MEG encoder.<n>Both models successfully predict neural activity, demonstrating significant correlations between estimated and observed MEG signals.
arXiv Detail & Related papers (2024-12-22T19:41:54Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Surrogate Gradient Spiking Neural Networks as Encoders for Large
Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method.
They have shown promising results on speech command recognition tasks.
In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Deep Recurrent Encoder: A scalable end-to-end network to model brain
signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once.
We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.