Hypernetworks for Personalizing ASR to Atypical Speech
- URL: http://arxiv.org/abs/2406.04240v4
- Date: Tue, 2 Jul 2024 19:51:54 GMT
- Title: Hypernetworks for Personalizing ASR to Atypical Speech
- Authors: Max Müller-Eberstein, Dianna Yee, Karren Yang, Gautam Varma Mantena, Colin Lea,
- Abstract summary: We propose a novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics.
We show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.
- Score: 7.486694572792521
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) for personalizing automatic speech recognition (ASR) has recently shown promise for adapting general population models to atypical speech. However, these approaches assume a priori knowledge of the atypical speech disorder being adapted for -- the diagnosis of which requires expert knowledge that is not always available. Even given this knowledge, data scarcity and high inter/intra-speaker variability further limit the effectiveness of traditional fine-tuning. To circumvent these challenges, we first identify the minimal set of model parameters required for ASR adaptation. Our analysis of each individual parameter's effect on adaptation performance allows us to reduce Word Error Rate (WER) by half while adapting 0.03% of all weights. Alleviating the need for cohort-specific models, we next propose the novel use of a meta-learned hypernetwork to generate highly individualized, utterance-level adaptations on-the-fly for a diverse set of atypical speech characteristics. Evaluating adaptation at the global, cohort and individual-level, we show that hypernetworks generalize better to out-of-distribution speakers, while maintaining an overall relative WER reduction of 75.2% using 0.1% of the full parameter budget.
Related papers
- Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work.
Our empirical investigation includes tens of thousands of models trained with all combinations of threes.
We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z) - Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and
Dysarthric Speech Recognition [64.9816313630768]
Fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models.
This paper investigates hyper- parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus.
arXiv Detail & Related papers (2023-06-27T07:49:35Z) - Factorised Speaker-environment Adaptive Training of Conformer Speech
Recognition Systems [31.813788489512394]
This paper proposes a novel factorised speaker-environment adaptive training and test time adaptation approach for Conformer ASR models.
Experiments on the 300-hr WHAM noise corrupted Switchboard data suggest that factorised adaptation consistently outperforms the baseline.
Further analysis shows the proposed method offers potential for rapid adaption to unseen speaker-environment conditions.
arXiv Detail & Related papers (2023-06-26T11:32:05Z) - Differentially Private Adapters for Parameter Efficient Acoustic
Modeling [24.72748979633543]
We introduce a noisy teacher-student ensemble into a conventional adaptation scheme.
We insert residual adapters between layers of the frozen pre-trained acoustic model.
Our solution reduces the number of trainable parameters by 97.5% using the RAs.
arXiv Detail & Related papers (2023-05-19T00:36:43Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
Training Large Transformer Models [132.90062129639705]
We propose a novel training strategy that encourages all parameters to be trained sufficiently.
A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate.
In contrast, a parameter with high sensitivity is well-trained and we regularize it by decreasing its learning rate to prevent further overfitting.
arXiv Detail & Related papers (2022-02-06T00:22:28Z) - A Unified Speaker Adaptation Approach for ASR [37.76683818356052]
We propose a unified speaker adaptation approach consisting of feature adaptation and model adaptation.
For feature adaptation, we employ a speaker-aware persistent memory model which generalizes better to unseen test speakers.
For model adaptation, we use a novel gradual pruning method to adapt to target speakers without changing the model architecture.
arXiv Detail & Related papers (2021-10-16T10:48:52Z) - Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences.
Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness.
This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.