Nonverbal Sound Detection for Disordered Speech
- URL: http://arxiv.org/abs/2202.07750v1
- Date: Tue, 15 Feb 2022 22:02:58 GMT
- Title: Nonverbal Sound Detection for Disordered Speech
- Authors: Colin Lea, Zifang Huang, Dhruv Jain, Lauren Tooley, Zeinab Liaghat,
Shrinath Thelapurath, Leah Findlater, Jeffrey P. Bigham
- Abstract summary: We introduce an alternative voice-based input system which relies on sound event detection using fifteen non-verbal mouth sounds.
This system was designed to work regardless of ones' speech abilities and allows full access to existing technology.
- Score: 24.636175845214822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice assistants have become an essential tool for people with various
disabilities because they enable complex phone- or tablet-based interactions
without the need for fine-grained motor control, such as with touchscreens.
However, these systems are not tuned for the unique characteristics of
individuals with speech disorders, including many of those who have a
motor-speech disorder, are deaf or hard of hearing, have a severe stutter, or
are minimally verbal. We introduce an alternative voice-based input system
which relies on sound event detection using fifteen nonverbal mouth sounds like
"pop," "click," or "eh." This system was designed to work regardless of ones'
speech abilities and allows full access to existing technology. In this paper,
we describe the design of a dataset, model considerations for real-world
deployment, and efforts towards model personalization. Our fully-supervised
model achieves segment-level precision and recall of 88.6% and 88.4% on an
internal dataset of 710 adults, while achieving 0.31 false positives per hour
on aggressors such as speech. Five-shot personalization enables satisfactory
performance in 84.5% of cases where the generic model fails.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Latent Phrase Matching for Dysarthric Speech [23.23672790496787]
Many consumer speech recognition systems are not tuned for people with speech disabilities.
We propose a query-by-example-based personalized phrase recognition system that is trained using small amounts of speech.
Performance degrades as the number of phrases increases, but consistently outperforms ASR systems when trained with 50 unique phrases.
arXiv Detail & Related papers (2023-06-08T17:28:28Z) - Play it by Ear: Learning Skills amidst Occlusion through Audio-Visual
Imitation Learning [62.83590925557013]
We learn a set of challenging partially-observed manipulation tasks from visual and audio inputs.
Our proposed system learns these tasks by combining offline imitation learning from tele-operated demonstrations and online finetuning.
In a set of simulated tasks, we find that our system benefits from using audio, and that by using online interventions we are able to improve the success rate of offline imitation learning by 20%.
arXiv Detail & Related papers (2022-05-30T04:52:58Z) - Self-Supervised Speech Representations Preserve Speech Characteristics
while Anonymizing Voices [15.136348385992047]
We train several voice conversion models using self-supervised speech representations.
Converted voices retain a low word error rate within 1% of the original voice.
Experiments on dysarthric speech data show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices.
arXiv Detail & Related papers (2022-04-04T17:48:01Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Personalized Automatic Speech Recognition Trained on Small Disordered
Speech Datasets [0.0]
We trained personalized models for 195 individuals with different types and severities of speech impairment.
For the home automation scenario, 79% of speakers reached the target WER with 18-20 minutes of speech; but even with only 3-4 minutes of speech, 63% of speakers reached the target WER.
arXiv Detail & Related papers (2021-10-09T17:11:17Z) - Analysis and Tuning of a Voice Assistant System for Dysfluent Speech [7.233685721929227]
Speech recognition systems do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks.
We show that by tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24% (relative) for individuals with fluency disorders.
arXiv Detail & Related papers (2021-06-18T20:58:34Z) - On-Device Personalization of Automatic Speech Recognition Models for
Disordered Speech [9.698986579582236]
We present an approach to on-device based ASR personalization with very small amounts of speaker-specific data.
We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker.
arXiv Detail & Related papers (2021-06-18T17:48:08Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.