Automatic Disfluency Detection from Untranscribed Speech
- URL: http://arxiv.org/abs/2311.00867v1
- Date: Wed, 1 Nov 2023 21:36:39 GMT
- Title: Automatic Disfluency Detection from Untranscribed Speech
- Authors: Amrit Romana, Kazuhito Koishida, Emily Mower Provost
- Abstract summary: Stuttering is a speech disorder characterized by a high rate of disfluencies.
automatic disfluency detection may help in treatment planning for individuals who stutter.
We investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization.
- Score: 25.534535098405602
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech disfluencies, such as filled pauses or repetitions, are disruptions in
the typical flow of speech. Stuttering is a speech disorder characterized by a
high rate of disfluencies, but all individuals speak with some disfluencies and
the rates of disfluencies may by increased by factors such as cognitive load.
Clinically, automatic disfluency detection may help in treatment planning for
individuals who stutter. Outside of the clinic, automatic disfluency detection
may serve as a pre-processing step to improve natural language understanding in
downstream applications. With this wide range of applications in mind, we
investigate language, acoustic, and multimodal methods for frame-level
automatic disfluency detection and categorization. Each of these methods relies
on audio as an input. First, we evaluate several automatic speech recognition
(ASR) systems in terms of their ability to transcribe disfluencies, measured
using disfluency error rates. We then use these ASR transcripts as input to a
language-based disfluency detection model. We find that disfluency detection
performance is largely limited by the quality of transcripts and alignments. We
find that an acoustic-based approach that does not require transcription as an
intermediate step outperforms the ASR language approach. Finally, we present
multimodal architectures which we find improve disfluency detection performance
over the unimodal approaches. Ultimately, this work introduces novel approaches
for automatic frame-level disfluency and categorization. In the long term, this
will help researchers incorporate automatic disfluency detection into a range
of applications.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Augmenting Automatic Speech Recognition Models with Disfluency Detection [12.45703869323415]
Speech disfluency commonly occurs in conversational and spontaneous speech.
Current research mainly focuses on detecting disfluencies within transcripts, overlooking their exact location and duration in the speech.
We present an inference-only approach to augment any ASR model with the ability to detect open-set disfluencies.
arXiv Detail & Related papers (2024-09-16T11:13:14Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Careful Whisper -- leveraging advances in automatic speech recognition
for robust and interpretable aphasia subtype classification [0.0]
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments.
By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts.
We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech.
arXiv Detail & Related papers (2023-08-02T15:53:59Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - DisfluencyFixer: A tool to enhance Language Learning through Speech To
Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi.
Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z) - The Far Side of Failure: Investigating the Impact of Speech Recognition
Errors on Subsequent Dementia Classification [8.032686410648274]
Linguistic anomalies detectable in spontaneous speech have shown promise for various clinical applications including screening for dementia and other forms of cognitive impairment.
The impressive performance of self-supervised learning (SSL) automatic speech recognition (ASR) models with curated speech data is not apparent with challenging speech samples from clinical settings.
One of our key findings is that, paradoxically, ASR systems with relatively high error rates can produce transcripts that result in better downstream classification accuracy than classification based on verbatim transcripts.
arXiv Detail & Related papers (2022-11-11T17:06:45Z) - Exploiting prompt learning with pre-trained language models for
Alzheimer's Disease detection [70.86672569101536]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and to delay further progression.
This paper investigates the use of prompt-based fine-tuning of PLMs that consistently uses AD classification errors as the training objective function.
arXiv Detail & Related papers (2022-10-29T09:18:41Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Auxiliary Sequence Labeling Tasks for Disfluency Detection [6.460424516393765]
We propose a method utilizing named entity recognition (NER) and part-of-speech (POS) as auxiliary sequence labeling (SL) tasks for disfluency detection.
We show that training a disfluency detection model with auxiliary SL tasks can improve its F-score in disfluency detection.
Experimental results on the widely used English Switchboard dataset show that our method outperforms the previous state-of-the-art in disfluency detection.
arXiv Detail & Related papers (2020-10-24T02:51:17Z) - End-to-End Speech Recognition and Disfluency Removal [15.910282983166024]
This paper investigates the task of end-to-end speech recognition and disfluency removal.
We show that end-to-end models do learn to directly generate fluent transcripts.
We propose two new metrics that can be used for evaluating integrated ASR and disfluency models.
arXiv Detail & Related papers (2020-09-22T03:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.