Dual Learning for Large Vocabulary On-Device ASR
- URL: http://arxiv.org/abs/2301.04327v1
- Date: Wed, 11 Jan 2023 06:32:28 GMT
- Title: Dual Learning for Large Vocabulary On-Device ASR
- Authors: Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael
Picheny, Kyunghyun Cho
- Abstract summary: Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once.
We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
- Score: 64.10124092250128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dual learning is a paradigm for semi-supervised machine learning that seeks
to leverage unsupervised data by solving two opposite tasks at once. In this
scheme, each model is used to generate pseudo-labels for unlabeled examples
that are used to train the other model. Dual learning has seen some use in
speech processing by pairing ASR and TTS as dual tasks. However, these results
mostly address only the case of using unpaired examples to compensate for very
small supervised datasets, and mostly on large, non-streaming models. Dual
learning has not yet been proven effective for using unsupervised data to
improve realistic on-device streaming models that are already trained on large
supervised corpora. We provide this missing piece though an analysis of an
on-device-sized streaming conformer trained on the entirety of Librispeech,
showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4%
with an LM.
Related papers
- A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult.
We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Learning Slice-Aware Representations with Mixture of Attentions [38.74444452556773]
This work extends the recent slice-based learning (SBL)citechen 2019slice with a mixture of attentions (MoA) to learn slice-aware attentive dual representations.
We empirically show that the MoA approach outperforms the baseline method as well as the original SBL approach on monitored slices with two natural language understanding tasks.
arXiv Detail & Related papers (2021-06-04T09:22:24Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models.
We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z) - SEED: Self-supervised Distillation For Visual Representation [34.63488756535054]
We propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion.
We show that SEED dramatically boosts the performance of small networks on downstream tasks.
arXiv Detail & Related papers (2021-01-12T20:04:50Z) - Evolving Losses for Unsupervised Video Representation Learning [91.2683362199263]
We present a new method to learn video representations from large-scale unlabeled video data.
The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods.
arXiv Detail & Related papers (2020-02-26T16:56:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.