Refining Automatic Speech Recognition System for older adults
- URL: http://arxiv.org/abs/2011.08346v1
- Date: Tue, 17 Nov 2020 00:00:45 GMT
- Title: Refining Automatic Speech Recognition System for older adults
- Authors: Liu Chen, Meysam Asgari
- Abstract summary: We develop an ASR system for socially isolated seniors (80+ years old) with possible cognitive impairments.
We experimentally identify that ASR for the adult population performs poorly on our target population.
We further improve the system by leveraging an attention mechanism to utilize the model's intermediate information.
- Score: 7.3709604810699085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building a high quality automatic speech recognition (ASR) system with
limited training data has been a challenging task particularly for a narrow
target population. Open-sourced ASR systems, trained on sufficient data from
adults, are susceptible on seniors' speech due to acoustic mismatch between
adults and seniors. With 12 hours of training data, we attempt to develop an
ASR system for socially isolated seniors (80+ years old) with possible
cognitive impairments. We experimentally identify that ASR for the adult
population performs poorly on our target population and transfer learning (TL)
can boost the system's performance. Standing on the fundamental idea of TL,
tuning model parameters, we further improve the system by leveraging an
attention mechanism to utilize the model's intermediate information. Our
approach achieves 1.58% absolute improvements over the TL model.
Related papers
- Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition [64.9816313630768]
Fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models.
This paper investigates hyper- parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus.
arXiv Detail & Related papers (2023-06-27T07:49:35Z) - Improving Fairness and Robustness in End-to-End Speech Recognition
through unsupervised clustering [49.069298478971696]
We present a privacy preserving approach to improve fairness and robustness of end-to-end ASR.
We extract utterance level embeddings using a speaker ID model trained on a public dataset.
We use cluster IDs instead of speaker utterance embeddings as extra features during model training.
arXiv Detail & Related papers (2023-06-06T21:13:08Z) - Automatic Severity Classification of Dysarthric speech by using
Self-supervised Model with Multi-task Learning [4.947423926765435]
We propose a novel automatic severity assessment method for dysarthric speech using the self-supervised model in conjunction with multi-task learning.
Wav2vec 2.0 XLS-R is trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR)
Our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score.
arXiv Detail & Related papers (2022-10-27T12:48:10Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - The NTNU System at the Interspeech 2020 Non-Native Children's Speech ASR
Challenge [13.232899176888575]
This paper describes the Interspeech 2020 Non-Native Children's Speech ASR Challenge supported by the SIG-CHILD group of ISCA.
All participants were restricted to develop their systems merely based on the speech and text corpora provided by the organizer.
To work around this under-resourced issue, we built our ASR system on top of CNN-TDNNF-based acoustic models.
arXiv Detail & Related papers (2020-05-18T02:51:26Z) - Semi-supervised ASR by End-to-end Self-training [18.725686837244265]
We propose a self-training method with an end-to-end system for semi-supervised ASR.
We iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update.
Our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 50%.
arXiv Detail & Related papers (2020-01-24T18:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.