Data Techniques For Online End-to-end Speech Recognition
- URL: http://arxiv.org/abs/2001.09221v2
- Date: Sun, 26 Jul 2020 22:58:12 GMT
- Title: Data Techniques For Online End-to-end Speech Recognition
- Authors: Yang Chen, Weiran Wang, I-Fan Chen, Chao Wang
- Abstract summary: Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data.
While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue.
We explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain.
- Score: 17.621967685914587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Practitioners often need to build ASR systems for new use cases in a short
amount of time, given limited in-domain data. While recently developed
end-to-end methods largely simplify the modeling pipelines, they still suffer
from the data sparsity issue. In this work, we explore a few
simple-to-implement techniques for building online ASR systems in an end-to-end
fashion, with a small amount of transcribed data in the target domain. These
techniques include data augmentation in the target domain, domain adaptation
using models previously trained on a large source domain, and knowledge
distillation on non-transcribed target domain data, using an adapted
bi-directional model as the teacher; they are applicable in real scenarios with
different types of resources. Our experiments demonstrate that each technique
is independently useful in the improvement of the online ASR performance in the
target domain.
Related papers
- Understanding the Cross-Domain Capabilities of Video-Based Few-Shot Action Recognition Models [3.072340427031969]
Few-shot action recognition (FSAR) aims to learn a model capable of identifying novel actions in videos using only a few examples.
In assuming the base dataset seen during meta-training and novel dataset used for evaluation can come from different domains, cross-domain few-shot learning alleviates data collection and annotation costs.
We systematically evaluate existing state-of-the-art single-domain, transfer-based, and cross-domain FSAR methods on new cross-domain tasks.
arXiv Detail & Related papers (2024-06-03T07:48:18Z) - Adapting to Distribution Shift by Visual Domain Prompt Generation [34.19066857066073]
We adapt a model at test-time using a few unlabeled data to address distribution shifts.
We build a knowledge bank to learn the transferable knowledge from source domains.
The proposed method outperforms previous work on 5 large-scale benchmarks including WILDS and DomainNet.
arXiv Detail & Related papers (2024-05-05T02:44:04Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z) - Flexible deep transfer learning by separate feature embeddings and
manifold alignment [0.0]
Object recognition is a key enabler across industry and defense.
Unfortunately, algorithms trained on existing labeled datasets do not directly generalize to new data because the data distributions do not match.
We propose a novel deep learning framework that overcomes this limitation by learning separate feature extractions for each domain.
arXiv Detail & Related papers (2020-12-22T19:24:44Z) - Multi-Domain Adversarial Feature Generalization for Person
Re-Identification [52.835955258959785]
We propose a multi-dataset feature generalization network (MMFA-AAE)
It is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to unseen' camera systems.
It also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2020-11-25T08:03:15Z) - Learning causal representations for robust domain adaptation [31.261956776418618]
In many real-world applications, target domain data may not always be available.
In this paper, we study the cases where at the training phase the target domain data is unavailable.
We propose a novel Causal AutoEncoder (CAE), which integrates deep autoencoder and causal structure learning into a unified model.
arXiv Detail & Related papers (2020-11-12T11:24:03Z) - A Review of Single-Source Deep Unsupervised Visual Domain Adaptation [81.07994783143533]
Large-scale labeled training datasets have enabled deep neural networks to excel across a wide range of benchmark vision tasks.
In many applications, it is prohibitively expensive and time-consuming to obtain large quantities of labeled data.
To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled or unlabeled target domain.
arXiv Detail & Related papers (2020-09-01T00:06:50Z) - Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [70.79442700890843]
We propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain.
With little training data, we show its transferability by outperforming prior best model by 13.9% on average.
arXiv Detail & Related papers (2020-04-23T08:17:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.