Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition
- URL: http://arxiv.org/abs/2306.15265v1
- Date: Tue, 27 Jun 2023 07:49:35 GMT
- Title: Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition
- Authors: Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi
Wang, Helen Meng, Xunying Liu
- Abstract summary: Fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models.
This paper investigates hyper- parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus.
- Score: 64.9816313630768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic recognition of disordered and elderly speech remains highly
challenging tasks to date due to data scarcity. Parameter fine-tuning is often
used to exploit the large quantities of non-aged and healthy speech pre-trained
models, while neural architecture hyper-parameters are set using expert
knowledge and remain unchanged. This paper investigates hyper-parameter
adaptation for Conformer ASR systems that are pre-trained on the Librispeech
corpus before being domain adapted to the DementiaBank elderly and UASpeech
dysarthric speech datasets. Experimental results suggest that hyper-parameter
adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over
parameter-only fine-tuning on DBank and UASpeech tasks respectively. An
intuitive correlation is found between the performance improvements by
hyper-parameter domain adaptation and the relative utterance length ratio
between the source and target domain data.
Related papers
- Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Hypernetworks for Personalizing ASR to Atypical Speech [7.486694572792521]
We propose a novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics.
We show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.
arXiv Detail & Related papers (2024-06-06T16:39:00Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Personalized Adversarial Data Augmentation for Dysarthric and Elderly
Speech Recognition [30.885165674448352]
This paper presents a novel set of speaker dependent (GAN) based data augmentation approaches for elderly and dysarthric speech recognition.
GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute.
Consistent performance improvements are retained after applying LHUC based speaker adaptation.
arXiv Detail & Related papers (2022-05-13T04:29:49Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.