A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations
- URL: http://arxiv.org/abs/2506.13835v1
- Date: Mon, 16 Jun 2025 07:57:35 GMT
- Title: A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations
- Authors: Masakazu Inoue, Motoshige Sato, Kenichi Tomeoka, Nathania Nah, Eri Hatakeyama, Kai Arulkumaran, Ilya Horiguchi, Shuntaro Sasai,
- Abstract summary: We introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements.<n>We show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets.
- Score: 0.20075899678041528
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Silent speech decoding, which performs unvocalized human speech recognition from electroencephalography/electromyography (EEG/EMG), increases accessibility for speech-impaired humans. However, data collection is difficult and performed using varying experimental setups, making it nontrivial to collect a large, homogeneous dataset. In this study we introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements and show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets. We achieve improved word classification accuracy in both healthy participants (95.3%), and a speech-impaired patient (54.5%), substantially outperforming models trained on single-subject data (70.1% and 13.2%). Moreover, our models also show gains in cross-language calibration performance. This increase in accuracy suggests the feasibility of developing practical silent speech decoding systems, particularly for speech-impaired patients.
Related papers
- Decoding Phone Pairs from MEG Signals Across Speech Modalities [0.4054486015338004]
We investigated magnetoencephalography signals to decode phones from brain activity during speech production and perception tasks.<n>Our results demonstrate significantly higher decoding accuracy during speech production compared to passive listening and playback modalities.
arXiv Detail & Related papers (2025-05-21T10:31:34Z) - Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation [6.405846203953988]
Decoding speech from electroencephalography (EEG) has the potential to advance brain-computer interfaces (BCIs)<n>EEG-based speech decoding faces major challenges, such as noisy data, limited datasets, and poor performance on complex tasks like speech perception.<n>This study attempts to address these challenges by employing variational autoencoders (VAEs) for EEG data augmentation to improve data quality.
arXiv Detail & Related papers (2025-01-08T08:55:10Z) - Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Speech Artifact Removal from EEG Recordings of Spoken Word Production
with Tensor Decomposition [20.397149635457346]
Speech artifacts contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.
To fuel further EEG research with speech production, a method using three-mode tensor decomposition is proposed.
In a picture-naming task, we collected raw data with speech artifacts by placing two electrodes near the mouth to record lip EMG.
arXiv Detail & Related papers (2022-06-01T17:10:23Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.