Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography
- URL: http://arxiv.org/abs/2411.02591v3
- Date: Sun, 05 Oct 2025 18:45:15 GMT
- Title: Geometry of orofacial neuromuscular signals: speech articulation decoding using surface electromyography
- Authors: Harshavardhana T. Gowda, Zachary D. McNaughton, Lee M. Miller,
- Abstract summary: We present data and methods for decoding speech articulations using surface electromyogram (EMG) signals.<n>EMG-based speech neuroprostheses offer a promising approach for restoring audible speech in individuals who have lost the ability to speak intelligibly.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective. In this article, we present data and methods for decoding speech articulations using surface electromyogram (EMG) signals. EMG-based speech neuroprostheses offer a promising approach for restoring audible speech in individuals who have lost the ability to speak intelligibly due to laryngectomy, neuromuscular diseases, stroke, or trauma-induced damage (e.g., from radiotherapy) to the speech articulators. Approach. To achieve this, we collect EMG signals from the face, jaw, and neck as subjects articulate speech, and we perform EMG-to-speech translation. Main results. Our findings reveal that the manifold of symmetric positive definite (SPD) matrices serves as a natural embedding space for EMG signals. Specifically, we provide an algebraic interpretation of the manifold-valued EMG data using linear transformations, and we analyze and quantify distribution shifts in EMG signals across individuals. Significance. Overall, our approach demonstrates significant potential for developing neural networks that are both data- and parameter-efficient, an important consideration for EMG-based systems, which face challenges in large-scale data collection and operate under limited computational resources on embedded devices.
Related papers
- E^2-LLM: Bridging Neural Signals and Interpretable Affective Analysis [54.763420895859035]
We present ELLM2-EEG-to-Emotion Large Language Model, first MLLM framework for interpretable emotion analysis from EEG.<n>ELLM integrates a pretrained EEG encoder with Q-based LLMs through learnable projection layers, employing a multi-stage training pipeline.<n>Experiments on the dataset across seven emotion categories demonstrate that ELLM2-EEG-to-Emotion Large Language Model achieves excellent performance on emotion classification.
arXiv Detail & Related papers (2026-01-11T13:21:20Z) - NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models [66.91449452840318]
We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer.<n>Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training.<n>Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks.
arXiv Detail & Related papers (2025-10-15T01:26:52Z) - WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities [55.00677513249723]
EEG signals simultaneously encode both cognitive processes and intrinsic neural states.<n>We map EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation.<n>The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations.
arXiv Detail & Related papers (2025-09-26T06:21:51Z) - A Silent Speech Decoding System from EEG and EMG with Heterogenous Electrode Configurations [0.20075899678041528]
We introduce neural networks that can handle EEG/EMG with heterogeneous electrode placements.<n>We show strong performance in silent speech decoding via multi-task training on large-scale EEG/EMG datasets.
arXiv Detail & Related papers (2025-06-16T07:57:35Z) - Articulatory Feature Prediction from Surface EMG during Speech Production [25.10685431811405]
We present a model for predicting articulatory features from surface electromyography (EMG) signals during speech production.<n>The proposed model integrates convolutional layers and a Transformer block, followed by separate predictors for articulatory features.<n>We demonstrate that these predicted articulatory features can be decoded into intelligible speech waveforms.
arXiv Detail & Related papers (2025-05-20T01:50:05Z) - Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer [9.914613096064848]
Decoding speech from electroencephalogram (EEG) is challenging due to a limited understanding of neural pronunciation mapping.
In this study, we developed a large-scale multi-utterance speech EEG from 57 right-handed native English-speaking subjects.
Our results reveal distinct speech neural features by the visualization of FAST-generated activation maps across frontal and temporal brain regions.
arXiv Detail & Related papers (2025-04-02T10:38:08Z) - Towards Dynamic Neural Communication and Speech Neuroprosthesis Based on Viseme Decoding [25.555303640695577]
Decoding text, speech, or images from human neural signals holds promising potential both as neuroprosthesis for patients and as innovative communication tools.
We developed a diffusion model-based framework to decode visual speech intentions from speech-related non-invasive brain signals.
We successfully reconstructed coherent lip movements, effectively bridging the gap between brain signals and dynamic visual interfaces.
arXiv Detail & Related papers (2025-01-09T04:47:27Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - Wearable intelligent throat enables natural speech in stroke patients with dysarthria [18.380855184550775]
Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments.
We present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors.
In tests with five stroke patients with dysarthria, IT's LLM agents intelligently corrected token errors and enriched sentence-level emotional and logical coherence.
arXiv Detail & Related papers (2024-11-27T12:03:52Z) - NeuGPT: Unified multi-modal Neural GPT [48.70587003475798]
NeuGPT is a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research.
Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F.
It can also simulate brain signals, thereby serving as a novel neural interface.
arXiv Detail & Related papers (2024-10-28T10:53:22Z) - Empowering Dysarthric Speech: Leveraging Advanced LLMs for Accurate Speech Correction and Multimodal Emotion Analysis [0.0]
This paper introduces a novel approach to recognizing and translating dysarthric speech.
We leverage advanced large language models for accurate speech correction and multimodal emotion analysis.
Our framework identifies emotions such as happiness, sadness, neutrality, surprise, anger, and fear, while reconstructing intended sentences from distorted speech with high accuracy.
arXiv Detail & Related papers (2024-10-13T20:54:44Z) - SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention [47.8479647938849]
We present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue.
We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations.
arXiv Detail & Related papers (2024-09-04T07:33:01Z) - Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs [12.234206036041218]
We use the deep neural network (DNN) DeepSpeech2 as a paradigm to investigate how natural input and cochlear implant-based inputs are processed over time.
We generate naturalistic and cochlear implant-like inputs from spoken sentences and test the similarity of model performance to human performance.
We find that dynamics over time in each layer are affected by context as well as input type.
arXiv Detail & Related papers (2024-07-30T04:32:27Z) - Topology of surface electromyogram signals: hand gesture decoding on Riemannian manifolds [0.0]
We present data and methods for decoding hand gestures using surface electromyogram (EMG) signals.<n>EMG-based upper limb interfaces are valuable for amputee rehabilitation, artificial supernumerary limb augmentation, gestural control of computers, and virtual and augmented reality applications.
arXiv Detail & Related papers (2023-11-14T21:20:54Z) - Brain-Driven Representation Learning Based on Diffusion Model [25.375490061512]
Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue.
Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms.
Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
arXiv Detail & Related papers (2023-11-14T05:59:58Z) - Inner speech recognition through electroencephalographic signals [2.578242050187029]
This work focuses on inner speech recognition starting from EEG signals.
The decoding of the EEG into text should be understood as the classification of a limited number of words (commands)
Speech-related BCIs provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.
arXiv Detail & Related papers (2022-10-11T08:29:12Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Synthesized Speech Detection Using Convolutional Transformer-Based
Spectrogram Analysis [16.93803259128475]
Synthesized speech can be used for nefarious purposes, including creating a purported speech signal and attributing it to someone who did not speak the content of the signal.
In this paper, we analyze speech signals in the form of spectrograms with a Compact Convolutional Transformer for synthesized speech detection.
arXiv Detail & Related papers (2022-05-03T22:05:35Z) - DriPP: Driven Point Processes to Model Stimuli Induced Patterns in M/EEG
Signals [62.997667081978825]
We develop a novel statistical point process model-called driven temporal point processes (DriPP)
We derive a fast and principled expectation-maximization (EM) algorithm to estimate the parameters of this model.
Results on standard MEG datasets demonstrate that our methodology reveals event-related neural responses.
arXiv Detail & Related papers (2021-12-08T13:07:21Z) - Heterogeneous Hand Guise Classification Based on Surface
Electromyographic Signals Using Multichannel Convolutional Neural Network [0.0]
Recent developments in the field of Machine Learning allow us to use EMG signals to teach machines the complex properties of human movements.
Modern machines are capable of detecting numerous human activities and distinguishing among them solely based on the EMG signals produced by those activities.
In this study, a novel classification method has been described employing a multichannel Convolutional Neural Network (CNN) that interprets surface EMG signals by the properties they exhibit in the power domain.
arXiv Detail & Related papers (2021-01-17T17:02:04Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.