Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition
- URL: http://arxiv.org/abs/2306.15265v1
- Date: Tue, 27 Jun 2023 07:49:35 GMT
- Title: Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition
- Authors: Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi
Wang, Helen Meng, Xunying Liu
- Abstract summary: Fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models.
This paper investigates hyper- parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus.
- Score: 64.9816313630768
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic recognition of disordered and elderly speech remains highly
challenging tasks to date due to data scarcity. Parameter fine-tuning is often
used to exploit the large quantities of non-aged and healthy speech pre-trained
models, while neural architecture hyper-parameters are set using expert
knowledge and remain unchanged. This paper investigates hyper-parameter
adaptation for Conformer ASR systems that are pre-trained on the Librispeech
corpus before being domain adapted to the DementiaBank elderly and UASpeech
dysarthric speech datasets. Experimental results suggest that hyper-parameter
adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over
parameter-only fine-tuning on DBank and UASpeech tasks respectively. An
intuitive correlation is found between the performance improvements by
hyper-parameter domain adaptation and the relative utterance length ratio
between the source and target domain data.
Related papers
- R+R:Understanding Hyperparameter Effects in DP-SGD [3.0668784884950235]
DP-SGD is the standard optimization algorithm for privacy-preserving machine learning.
It is still commonly challenged by low performance compared to non-private learning approaches.
arXiv Detail & Related papers (2024-11-04T12:56:35Z) - Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - Hypernetworks for Personalizing ASR to Atypical Speech [7.486694572792521]
We propose a novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics.
We show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.
arXiv Detail & Related papers (2024-06-06T16:39:00Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - Personalized Adversarial Data Augmentation for Dysarthric and Elderly
Speech Recognition [30.885165674448352]
This paper presents a novel set of speaker dependent (GAN) based data augmentation approaches for elderly and dysarthric speech recognition.
GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute.
Consistent performance improvements are retained after applying LHUC based speaker adaptation.
arXiv Detail & Related papers (2022-05-13T04:29:49Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.