Stuttering Speech Disfluency Prediction using Explainable Attribution
Vectors of Facial Muscle Movements
- URL: http://arxiv.org/abs/2010.01231v1
- Date: Fri, 2 Oct 2020 23:45:41 GMT
- Title: Stuttering Speech Disfluency Prediction using Explainable Attribution
Vectors of Facial Muscle Movements
- Authors: Arun Das, Jeffrey Mock, Henry Chacon, Farzan Irani, Edward Golob,
Peyman Najafirad
- Abstract summary: Speech disorders such as stuttering disrupt the normal fluency of speech by involuntary repetitions, prolongations and blocking of sounds and syllables.
Recent studies have explored automatic detection of stuttering using Artificial Intelligence (AI) based algorithm from respiratory rate, audio, etc. during speech utterance.
We hypothesize that pre-speech facial activity in AWS, which can be captured non-invasively, contains enough information to accurately classify the upcoming utterance as either fluent or stuttered.
- Score: 2.6540572249827514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech disorders such as stuttering disrupt the normal fluency of speech by
involuntary repetitions, prolongations and blocking of sounds and syllables. In
addition to these disruptions to speech fluency, most adults who stutter (AWS)
also experience numerous observable secondary behaviors before, during, and
after a stuttering moment, often involving the facial muscles. Recent studies
have explored automatic detection of stuttering using Artificial Intelligence
(AI) based algorithm from respiratory rate, audio, etc. during speech
utterance. However, most methods require controlled environments and/or
invasive wearable sensors, and are unable explain why a decision (fluent vs
stuttered) was made. We hypothesize that pre-speech facial activity in AWS,
which can be captured non-invasively, contains enough information to accurately
classify the upcoming utterance as either fluent or stuttered. Towards this
end, this paper proposes a novel explainable AI (XAI) assisted convolutional
neural network (CNN) classifier to predict near future stuttering by learning
temporal facial muscle movement patterns of AWS and explains the important
facial muscles and actions involved. Statistical analyses reveal significantly
high prevalence of cheek muscles (p<0.005) and lip muscles (p<0.005) to predict
stuttering and shows a behavior conducive of arousal and anticipation to speak.
The temporal study of these upper and lower facial muscles may facilitate early
detection of stuttering, promote automated assessment of stuttering and have
application in behavioral therapies by providing automatic non-invasive
feedback in realtime.
Related papers
- Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems [55.99999020778169]
We study a function that can predict the forthcoming words and estimate the time remaining until the end of an utterance.
We develop a cross-attention-based algorithm that incorporates both acoustic and linguistic information.
Results demonstrate the proposed model's ability to predict upcoming words and estimate future EOU events up to 300ms prior to the actual EOU.
arXiv Detail & Related papers (2024-09-30T06:29:58Z) - Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered
Speech [20.2646788350211]
Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases.
We describe Stutter-TTS, an end-to-end neural text-to-speech model capable of synthesizing diverse types of stuttering utterances.
arXiv Detail & Related papers (2022-11-04T23:45:31Z) - TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation [61.564874831498145]
TranSpeech is a speech-to-speech translation model with bilateral perturbation.
We establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices.
TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique.
arXiv Detail & Related papers (2022-05-25T06:34:14Z) - Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 [0.22940141855172028]
Fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus boosts the effectiveness of the general-purpose features.
We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency dataset.
arXiv Detail & Related papers (2022-04-07T13:02:12Z) - KSoF: The Kassel State of Fluency Dataset -- A Therapy Centered Dataset
of Stuttering [58.91587609873915]
This work introduces the Kassel State of Fluency (KSoF), a therapy-based dataset containing over 5500 clips of stuttering PWSs.
The audio was recorded during therapy sessions at the Institut der Kasseler Stottertherapie.
arXiv Detail & Related papers (2022-03-10T14:17:07Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z) - Towards Automated Assessment of Stuttering and Stuttering Therapy [0.22940141855172028]
Stuttering is a complex speech disorder that can be identified by repetitions, prolongations of sounds, syllables or words, and blocks while speaking.
Common methods for the assessment of stuttering severity include percent stuttered syllables (% SS), the average of the three longest stuttering symptoms during a speech task, or the recently introduced Speech Efficiency Score (SES)
This paper introduces the Speech Control Index (SCI), a new method to evaluate the severity of stuttering.
arXiv Detail & Related papers (2020-06-16T14:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.