Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
- URL: http://arxiv.org/abs/2204.03417v1
- Date: Thu, 7 Apr 2022 13:02:12 GMT
- Title: Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
- Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar N\"oth, Korbinian
Riedhammer
- Abstract summary: Fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus boosts the effectiveness of the general-purpose features.
We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency dataset.
- Score: 0.22940141855172028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stuttering is a varied speech disorder that harms an individual's
communication ability. Persons who stutter (PWS) often use speech therapy to
cope with their condition. Improving speech recognition systems for people with
such non-typical speech or tracking the effectiveness of speech therapy would
require systems that can detect dysfluencies while at the same time being able
to detect speech techniques acquired in therapy.
This paper shows that fine-tuning wav2vec 2.0 for the classification of
stuttering on a sizeable English corpus containing stuttered speech, in
conjunction with multi-task learning, boosts the effectiveness of the
general-purpose wav2vec 2.0 features for detecting stuttering in speech; both
within and across languages. We evaluate our method on Fluencybank and the
German therapy-centric Kassel State of Fluency (KSoF) dataset by training
Support Vector Machine classifiers using features extracted from the fine-tuned
models for six different stuttering-related events types: blocks,
prolongations, sound repetitions, word repetitions, interjections, and -
specific to therapy - speech modifications. Using embeddings from the
fine-tuned models leads to relative classification performance gains up to 27\%
w.r.t. F1-score.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - MMSD-Net: Towards Multi-modal Stuttering Detection [9.257985820122999]
MMSD-Net is the first multi-modal neural framework for stuttering detection.
Our model yields an improvement of 2-17% in the F1-score over existing state-of-the-art uni-modal approaches.
arXiv Detail & Related papers (2024-07-16T08:26:59Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
Speech Translation [94.80029087828888]
Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST.
Direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
We propose in this paper a Speech2S model, which is jointly pre-trained with unpaired speech and bilingual text data for direct speech-to-speech translation tasks.
arXiv Detail & Related papers (2022-10-31T02:55:51Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Cross-lingual Self-Supervised Speech Representations for Improved
Dysarthric Speech Recognition [15.136348385992047]
This study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.
We train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model.
Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance.
arXiv Detail & Related papers (2022-04-04T17:36:01Z) - KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset
of Stuttering [58.91587609873915]
This work introduces the Kassel State of Fluency (KSoF), a therapy-based dataset containing over 5500 clips of stuttering PWSs.
The audio was recorded during therapy sessions at the Institut der Kasseler Stottertherapie.
arXiv Detail & Related papers (2022-03-10T14:17:07Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Stutter Diagnosis and Therapy System Based on Deep Learning [2.3581263491506097]
Stuttering, also called stammering, is a communication disorder that breaks the continuity of the speech.
This paper focuses on the implementation of a stutter diagnosis agent using Gated Recurrent CNN on MFCC audio features and therapy recommendation agent using SVM.
arXiv Detail & Related papers (2020-07-13T10:24:02Z) - Towards Automated Assessment of Stuttering and Stuttering Therapy [0.22940141855172028]
Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking.
Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the average of the three longest stuttering symptoms during a speech task, or the recently introduced Speech Efficiency Score (SES)
This paper introduces the Speech Control Index (SCI), a new method to evaluate the severity of stuttering.
arXiv Detail & Related papers (2020-06-16T14:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.