The Gift of Feedback: Improving ASR Model Quality by Learning from User
Corrections through Federated Learning
- URL: http://arxiv.org/abs/2310.00141v2
- Date: Thu, 30 Nov 2023 21:05:43 GMT
- Title: The Gift of Feedback: Improving ASR Model Quality by Learning from User
Corrections through Federated Learning
- Authors: Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit
Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews
- Abstract summary: We seek to continually learn from on-device user corrections through Federated Learning (FL)
We explore techniques to target fresh terms that the model has not previously encountered, learn long-tail words, and catastrophic forgetting.
In experimental evaluations, we find that the proposed techniques improve model recognition of fresh terms, while preserving quality on the overall language distribution.
- Score: 20.643270151774182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic speech recognition (ASR) models are typically trained on large
datasets of transcribed speech. As language evolves and new terms come into
use, these models can become outdated and stale. In the context of models
trained on the server but deployed on edge devices, errors may result from the
mismatch between server training data and actual on-device usage. In this work,
we seek to continually learn from on-device user corrections through Federated
Learning (FL) to address this issue. We explore techniques to target fresh
terms that the model has not previously encountered, learn long-tail words, and
mitigate catastrophic forgetting. In experimental evaluations, we find that the
proposed techniques improve model recognition of fresh terms, while preserving
quality on the overall language distribution.
Related papers
- How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario [72.02391485962127]
Speech Self-Supervised Learning (SSL) models achieve impressive performance on Automatic Speech Recognition (ASR)
In low-resource language ASR, they encounter the domain mismatch problem between pre-trained and low-resource languages.
We extend a conventional efficient fine-tuning scheme based on the adapter to handle these issues.
arXiv Detail & Related papers (2024-11-27T10:51:00Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation [27.057810339120664]
We propose two techniques to improve context-aware ASR models.
On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion.
On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
arXiv Detail & Related papers (2024-07-14T19:32:33Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose a self-supervised continual learning approach for Automatic Speech Recognition.
We use a memory-enhanced ASR model from the literature to decode new words from the slides.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation
via Attention Regularization [31.40751207207214]
Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives.
This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs.
Regularized models produce better counter narratives than state-of-the-art approaches in most cases.
arXiv Detail & Related papers (2023-09-05T15:27:22Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - Enabling On-Device Training of Speech Recognition Models with Federated
Dropout [4.165917555996752]
Federated learning can be used to train machine learning models on the edge on local data that never leave devices.
We propose using federated dropout to reduce the size of client models while training a full-size model server-side.
arXiv Detail & Related papers (2021-10-07T17:22:40Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - TERA: Self-Supervised Learning of Transformer Encoder Representation for
Speech [63.03318307254081]
TERA stands for Transformer Representations from Alteration.
We use alteration along three axes to pre-train Transformers on a large amount of unlabeled speech.
TERA can be used for speech representations extraction or fine-tuning with downstream models.
arXiv Detail & Related papers (2020-07-12T16:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.