Related papers: Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

URL: http://arxiv.org/abs/2506.04981v1
Date: Thu, 05 Jun 2025 12:53:20 GMT
Title: Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering
Authors: Andres Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esau Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke,
Abstract summary: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce.<n>We propose an incremental semi-supervised learning pipeline that integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain.
Score: 11.50314008820538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxiliary data. Filtering based on multi-model consensus or named entity recognition (NER) is then applied to select and iteratively refine pseudo-labels, showing slower performance saturation compared to random selection. Evaluated on the multi-domain Wow call center and Fisher English corpora, it outperforms single-step fine-tuning. Consensus-based filtering outperforms other methods, providing up to 22.3% relative improvement on Wow and 24.8% on Fisher over single-step fine-tuning with random selection. NER is the second-best filter, providing competitive performance at a lower computational cost.

Related papers

Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training [53.07879717463279]
textscDomain2Vec decomposes any dataset into a linear combination of several emphmeta-domains<n>textscDomain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead.
arXiv Detail & Related papers (2025-06-12T17:53:51Z)
Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering [11.50314008820538]
Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources.<n>We propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper and Zipformer.
arXiv Detail & Related papers (2025-06-04T08:11:24Z)
Unsupervised Domain Adaptive Person Search via Dual Self-Calibration [12.158126976694488]
Unsupervised Domain Adaptive (UDA) person search focuses on employing the model trained on a labeled source domain dataset to a target domain dataset without any additional annotations.<n>Most effective UDA person search methods typically utilize the ground truth of the source domain and pseudo-labels derived from clustering.<n>We propose a Dual Self-Calibration (DSCA) framework for UDA person search that effectively eliminates the interference of noisy pseudo-labels.
arXiv Detail & Related papers (2024-12-21T06:54:00Z)
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement [19.277560848076984]
Two-stage selection strategies result in scale bias and redundancy due to mismatch between selected queries and objects. We propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries. The proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets.
arXiv Detail & Related papers (2024-03-24T13:01:57Z)
Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance [37.646655530394604]
Federated Learning (FL) is a distributed learning paradigm to train a global model across multiple devices without collecting local data. We present the first adaptive client sampler, K-Vib, employing an independent sampling procedure. K-Vib achieves a linear speed-up on the regret bound $tildemathcalObig(Nfrac13Tfrac23/Kfrac43big)$ within a set communication budget.
arXiv Detail & Related papers (2023-10-04T10:08:01Z)
FilFL: Client Filtering for Optimized Client Participation in Federated Learning [71.46173076298957]
Federated learning enables clients to collaboratively train a model without exchanging local data. Clients participating in the training process significantly impact the convergence rate, learning efficiency, and model generalization. We propose a novel approach, client filtering, to improve model generalization and optimize client participation and training.
arXiv Detail & Related papers (2023-02-13T18:55:31Z)
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection [107.52026281057343]
We introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations. In the first stage, we utilize all the original and augmented source data to train an object detector. In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.
arXiv Detail & Related papers (2021-12-16T04:07:01Z)
Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization [28.22184410167622]
We present a data filtering method for open-domain dialogues. We score training samples with a quality measure, sort them in descending order, and filter out those at the bottom. Experimental results on two datasets show that our method can effectively identify untrustworthy samples.
arXiv Detail & Related papers (2021-09-14T06:42:54Z)
Fast Variational AutoEncoder with Inverted Multi-Index for Collaborative Filtering [59.349057602266]
Variational AutoEncoder (VAE) has been extended as a representative nonlinear method for collaborative filtering. We propose to decompose the inner-product-based softmax probability based on the inverted multi-index. FastVAE can outperform the state-of-the-art baselines in terms of both sampling quality and efficiency.
arXiv Detail & Related papers (2021-09-13T08:31:59Z)
On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting. We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z)
Gradient Matching for Domain Generalization [93.04545793814486]
A critical requirement of machine learning systems is their ability to generalize to unseen domains. We propose an inter-domain gradient matching objective that targets domain generalization. We derive a simpler first-order algorithm named Fish that approximates its optimization.
arXiv Detail & Related papers (2021-04-20T12:55:37Z)
Instance Level Affinity-Based Transfer for Unsupervised Domain Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA. We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process. We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.