Self-supervised Speech Models for Word-Level Stuttered Speech Detection
- URL: http://arxiv.org/abs/2409.10704v1
- Date: Mon, 16 Sep 2024 20:18:20 GMT
- Title: Self-supervised Speech Models for Word-Level Stuttered Speech Detection
- Authors: Yi-Jen Shih, Zoi Gkalitsiou, Alexandros G. Dimakis, David Harwath,
- Abstract summary: We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
- Score: 66.46810024006712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clinical diagnosis of stuttering requires an assessment by a licensed speech-language pathologist. However, this process is time-consuming and requires clinicians with training and experience in stuttering and fluency disorders. Unfortunately, only a small percentage of speech-language pathologists report being comfortable working with individuals who stutter, which is inadequate to accommodate for the 80 million individuals who stutter worldwide. Developing machine learning models for detecting stuttered speech would enable universal and automated screening for stuttering, enabling speech pathologists to identify and follow up with patients who are most likely to be diagnosed with a stuttering speech disorder. Previous research in this area has predominantly focused on utterance-level detection, which is not sufficient for clinical settings where word-level annotation of stuttering is the norm. In this study, we curated a stuttered speech dataset with word-level annotations and introduced a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection. Additionally, we conducted an extensive ablation analysis of our method, providing insight into the most important aspects of adapting self-supervised speech models for stuttered speech detection.
Related papers
- Impact of Speech Mode in Automatic Pathological Speech Detection [14.011517808456892]
This paper analyzes the influence of speech mode on pathological speech detection approaches.
It examines two categories of approaches, i.e., classical machine learning and deep learning.
Results indicate that classical approaches may struggle to capture pathology-discriminant cues in spontaneous speech.
In contrast, deep learning approaches demonstrate superior performance, managing to extract additional cues that were previously inaccessible in non-spontaneous speech.
arXiv Detail & Related papers (2024-06-14T12:19:18Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 [0.22940141855172028]
Fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus boosts the effectiveness of the general-purpose features.
We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency dataset.
arXiv Detail & Related papers (2022-04-07T13:02:12Z) - KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset
of Stuttering [58.91587609873915]
This work introduces the Kassel State of Fluency (KSoF), a therapy-based dataset containing over 5500 clips of stuttering PWSs.
The audio was recorded during therapy sessions at the Institut der Kasseler Stottertherapie.
arXiv Detail & Related papers (2022-03-10T14:17:07Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Machine Learning for Stuttering Identification: Review, Challenges &
Future Directions [9.726119468893721]
Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds.
Recent developments in machine and deep learning have dramatically revolutionized speech domain.
This work fills the gap by trying to bring researchers together from interdisciplinary fields.
arXiv Detail & Related papers (2021-07-08T18:15:20Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z) - Stutter Diagnosis and Therapy System Based on Deep Learning [2.3581263491506097]
Stuttering, also called stammering, is a communication disorder that breaks the continuity of the speech.
This paper focuses on the implementation of a stutter diagnosis agent using Gated Recurrent CNN on MFCC audio features and therapy recommendation agent using SVM.
arXiv Detail & Related papers (2020-07-13T10:24:02Z) - Towards Automated Assessment of Stuttering and Stuttering Therapy [0.22940141855172028]
Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking.
Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the average of the three longest stuttering symptoms during a speech task, or the recently introduced Speech Efficiency Score (SES)
This paper introduces the Speech Control Index (SCI), a new method to evaluate the severity of stuttering.
arXiv Detail & Related papers (2020-06-16T14:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.