Related papers: SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models

SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models

URL: http://arxiv.org/abs/2510.15566v1
Date: Fri, 17 Oct 2025 11:54:55 GMT
Title: SpikeVox: Towards Energy-Efficient Speech Therapy Framework with Spike-driven Generative Language Models
Authors: Rachmad Vidya Wicaksana Putra, Aadithyan Rajesh Nair, Muhammad Shafique,
Abstract summary: SpikeVox is a novel framework for enabling energy-efficient speech therapy solutions.<n>SpikeVox employs a speech recognition module to perform highly accurate speech-to-text conversion.<n>It also generates suitable exercises for therapy and provides guidance on correct pronunciation as feedback.
Score: 3.1061484260786014
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech disorders can significantly affect the patients capability to communicate, learn, and socialize. However, existing speech therapy solutions (e.g., therapist or tools) are still limited and costly, hence such solutions remain inadequate for serving millions of patients worldwide. To address this, state-of-the-art methods employ neural network (NN) algorithms to help accurately detecting speech disorders. However, these methods do not provide therapy recommendation as feedback, hence providing partial solution for patients. Moreover, these methods incur high energy consumption due to their complex and resource-intensive NN processing, hence hindering their deployments on low-power/energy platforms (e.g., smartphones). Toward this, we propose SpikeVox, a novel framework for enabling energy-efficient speech therapy solutions through spike-driven generative language model. Specifically, SpikeVox employs a speech recognition module to perform highly accurate speech-to-text conversion; leverages a spike-driven generative language model to efficiently perform pattern analysis for speech disorder detection and generates suitable exercises for therapy; provides guidance on correct pronunciation as feedback; as well as utilizes the REST API to enable seamless interaction for users. Experimental results demonstrate that SpikeVox achieves 88% confidence level on average in speech disorder recognition, while providing a complete feedback for therapy exercises. Therefore, SpikeVox provides a comprehensive framework for energy-efficient speech therapy solutions, and potentially addresses the significant global speech therapy access gap.

Related papers

Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition [8.838919369202525]
Speech impairments resulting from congenital disorders present major challenges to automatic speech recognition systems.<n>State-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability.<n>This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning.
arXiv Detail & Related papers (2025-09-23T13:44:58Z)
Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech [0.562479170374811]
Speech impairments caused by conditions such as cerebral palsy or genetic disorders pose significant challenges for automatic speech recognition systems.<n>We propose a practical and lightweight pipeline to personalize ASR models, formalizing the selection of words and enriching a small, speech-impaired dataset with semantic coherence.<n>Our approach shows promising improvements in transcription quality, demonstrating the potential to reduce communication barriers for individuals with atypical speech patterns.
arXiv Detail & Related papers (2025-06-23T15:30:50Z)
Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching [0.0]
Dysarthria is a neurological disorder that significantly impairs speech intelligibility.<n>This necessitates the development of robust dysarthric-to-regular speech conversion techniques.
arXiv Detail & Related papers (2025-06-19T08:24:17Z)
Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z)
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 [0.22940141855172028]
Fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus boosts the effectiveness of the general-purpose features. We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency dataset.
arXiv Detail & Related papers (2022-04-07T13:02:12Z)
KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset of Stuttering [58.91587609873915]
This work introduces the Kassel State of Fluency (KSoF), a therapy-based dataset containing over 5500 clips of stuttering PWSs. The audio was recorded during therapy sessions at the Institut der Kasseler Stottertherapie.
arXiv Detail & Related papers (2022-03-10T14:17:07Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Investigation of Data Augmentation Techniques for Disordered Speech Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition. Both normal and disordered speech were exploited in the augmentation process. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
STAN: A stuttering therapy analysis helper [59.37911277681339]
Stuttering is a complex speech disorder identified by repeti-tions, prolongations of sounds, syllables or words and blockswhile speaking. We introduceSTAN, a system to aid speech therapists in stuttering therapysessions.
arXiv Detail & Related papers (2021-06-15T13:48:12Z)
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate. We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique. Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z)
Stutter Diagnosis and Therapy System Based on Deep Learning [2.3581263491506097]
Stuttering, also called stammering, is a communication disorder that breaks the continuity of the speech. This paper focuses on the implementation of a stutter diagnosis agent using Gated Recurrent CNN on MFCC audio features and therapy recommendation agent using SVM.
arXiv Detail & Related papers (2020-07-13T10:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.