Related papers: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

URL: http://arxiv.org/abs/2110.00165v2
Date: Mon, 4 Oct 2021 23:40:03 GMT
Title: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning
Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Fran\c{c}oise Beaufays, Yanzhang He
Abstract summary: We utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model. This approach demonstrates that using the source domain data with a small fraction of the target domain data (3%) can recover the performance gap compared to a full data baseline.
Score: 26.110250680951854
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model. This approach demonstrates that using the source domain data with a small fraction of the target domain data (3%) can recover the performance gap compared to a full data baseline: relative 13.5% WER improvement for target domain data.

Related papers

DIDS: Domain Impact-aware Data Sampling for Large Language Model Training [41.86545248261005]
We present Domain Impact-aware Data Sampling (DIDS) to optimize domain-level sampling strategies. DIDS achieves 3.4% higher average performance while maintaining comparable training efficiency.
arXiv Detail & Related papers (2025-04-17T13:09:38Z)
What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context [28.634315143647385]
Source-free domain adaptation (SFDA) involves adapting a model originally trained using a labeled dataset to perform effectively on an unlabeled dataset. This adaptation is especially crucial when significant disparities in data distributions exist between the two domains. We introduce a straightforward yet highly effective latent augmentation method tailored for contrastive SFDA.
arXiv Detail & Related papers (2024-12-18T20:09:46Z)
Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition [1.2878987353423252]
Unsupervised domain adaptation (UDA) has become increasingly prevalent in scene text recognition (STR) We introduce the Stratified Domain Adaptation (StrDA) approach, which examines the gradual escalation of the domain gap for the learning process. We propose a novel method for employing domain discriminators to estimate the out-of-distribution and domain discriminative levels of data samples.
arXiv Detail & Related papers (2024-10-13T16:40:48Z)
Style Adaptation for Domain-adaptive Semantic Segmentation [2.1365683052370046]
Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain. We introduce a straightforward approach to mitigate the domain discrepancy, which necessitates no additional parameter calculations and seamlessly integrates with self-training-based UDA methods. Our proposed method attains a noteworthy UDA performance of 76.93 mIoU on the GTA->Cityscapes dataset, representing a notable improvement of +1.03 percentage points over the previous state-of-the-art results.
arXiv Detail & Related papers (2024-04-25T02:51:55Z)
Open-Set Domain Adaptation with Visual-Language Foundation Models [51.49854335102149]
Unsupervised domain adaptation (UDA) has proven to be very effective in transferring knowledge from a source domain to a target domain with unlabeled data. Open-set domain adaptation (ODA) has emerged as a potential solution to identify these classes during the training phase.
arXiv Detail & Related papers (2023-07-30T11:38:46Z)
IDA: Informed Domain Adaptive Semantic Segmentation [51.12107564372869]
We propose an Domain Informed Adaptation (IDA) model, a self-training framework that mixes the data based on class-level segmentation performance. In our IDA model, the class-level performance is tracked by an expected confidence score (ECS) and we then use a dynamic schedule to determine the mixing ratio for data in different domains. Our proposed method is able to outperform the state-of-the-art UDA-SS method by a margin of 1.1 mIoU in the adaptation of GTA-V to Cityscapes and of 0.9 mIoU in the adaptation of SYNTHIA to City
arXiv Detail & Related papers (2023-03-05T18:16:34Z)
MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation [98.09845149258972]
We introduce active sample selection to assist domain adaptation regarding the semantic segmentation task. With only a little workload to manually annotate these samples, the distortion of the target-domain distribution can be effectively alleviated. A powerful semi-supervised domain adaptation strategy is proposed to alleviate the long-tail distribution problem.
arXiv Detail & Related papers (2023-01-18T07:55:22Z)
Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z)
Gradual Domain Adaptation via Self-Training of Auxiliary Models [50.63206102072175]
Domain adaptation becomes more challenging with increasing gaps between source and target domains. We propose self-training of auxiliary models (AuxSelfTrain) that learns models for intermediate domains. Experiments on benchmark datasets of unsupervised and semi-supervised domain adaptation verify its efficacy.
arXiv Detail & Related papers (2021-06-18T03:15:25Z)
Generic Semi-Supervised Adversarial Subject Translation for Sensor-Based Human Activity Recognition [6.2997667081978825]
This paper presents a novel generic and robust approach for semi-supervised domain adaptation in Human Activity Recognition. It capitalizes on the advantages of the adversarial framework to tackle the shortcomings, by leveraging knowledge from annotated samples exclusively from the source subject and unlabeled ones of the target subject. The results demonstrate the effectiveness of our proposed algorithms over state-of-the-art methods, which led in up to 13%, 4%, and 13% improvement of our high-level activities recognition metrics for Opportunity, LISSI, and PAMAP2 datasets.
arXiv Detail & Related papers (2020-11-11T12:16:23Z)
Domain Adaptation in LiDAR Semantic Segmentation by Aligning Class Distributions [9.581605678437032]
This work addresses the problem of unsupervised domain adaptation for LiDAR semantic segmentation models. Our approach combines novel ideas on top of the current state-of-the-art approaches and yields new state-of-the-art results.
arXiv Detail & Related papers (2020-10-23T08:52:15Z)
Towards Fair Cross-Domain Adaptation via Generative Learning [50.76694500782927]
Domain Adaptation (DA) targets at adapting a model trained over the well-labeled source domain to the unlabeled target domain lying in different distributions. We develop a novel Generative Few-shot Cross-domain Adaptation (GFCA) algorithm for fair cross-domain classification.
arXiv Detail & Related papers (2020-03-04T23:25:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.