Related papers: Dual Learning for Large Vocabulary On-Device ASR

Dual Learning for Large Vocabulary On-Device ASR

URL: http://arxiv.org/abs/2301.04327v1
Date: Wed, 11 Jan 2023 06:32:28 GMT
Title: Dual Learning for Large Vocabulary On-Device ASR
Authors: Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho
Abstract summary: Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
Score: 64.10124092250128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to train the other model. Dual learning has seen some use in speech processing by pairing ASR and TTS as dual tasks. However, these results mostly address only the case of using unpaired examples to compensate for very small supervised datasets, and mostly on large, non-streaming models. Dual learning has not yet been proven effective for using unsupervised data to improve realistic on-device streaming models that are already trained on large supervised corpora. We provide this missing piece though an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.

Related papers

S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning [51.84977135926156]
We introduce S$2$R, an efficient framework that enhances LLM reasoning by teaching models to self-verify and self-correct during inference. Our results demonstrate that Qwen2.5-math-7B achieves an accuracy improvement from 51.0% to 81.6%, outperforming models trained on an equivalent amount of long-CoT distilled data.
arXiv Detail & Related papers (2025-02-18T13:40:22Z)
Realistic Image-to-Image Machine Unlearning via Decoupling and Knowledge Retention [1.795561427808824]
We argue that the machine learning model performs fairly well on unseen data. We propose a framework which decouples the model parameters with gradient ascent. We also provide $(epsilon, delta)$-unlearning guarantee for model updates with gradient ascent.
arXiv Detail & Related papers (2025-02-06T17:46:49Z)
A Dual Approach to Imitation Learning from Observations with Offline Datasets [19.856363985916644]
Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. We derive DILO, an algorithm that can leverage arbitrary suboptimal data to learn imitating policies without requiring expert actions.
arXiv Detail & Related papers (2024-06-13T04:39:42Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure. In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Learning Slice-Aware Representations with Mixture of Attentions [38.74444452556773]
This work extends the recent slice-based learning (SBL)citechen 2019slice with a mixture of attentions (MoA) to learn slice-aware attentive dual representations. We empirically show that the MoA approach outperforms the baseline method as well as the original SBL approach on monitored slices with two natural language understanding tasks.
arXiv Detail & Related papers (2021-06-04T09:22:24Z)
Distill on the Go: Online knowledge distillation in self-supervised learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models. We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation. Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z)
Adversarial Examples for Unsupervised Machine Learning Models [71.81480647638529]
Adrial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models. We propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation.
arXiv Detail & Related papers (2021-03-02T17:47:58Z)
SEED: Self-supervised Distillation For Visual Representation [34.63488756535054]
We propose a new learning paradigm, named SElf-SupErvised Distillation (SEED), to transfer its representational knowledge into a smaller architecture (as Student) in a self-supervised fashion. We show that SEED dramatically boosts the performance of small networks on downstream tasks.
arXiv Detail & Related papers (2021-01-12T20:04:50Z)
Evolving Losses for Unsupervised Video Representation Learning [91.2683362199263]
We present a new method to learn video representations from large-scale unlabeled video data. The proposed unsupervised representation learning results in a single RGB network and outperforms previous methods.
arXiv Detail & Related papers (2020-02-26T16:56:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.