Related papers: Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

URL: http://arxiv.org/abs/2506.03681v1
Date: Wed, 04 Jun 2025 08:11:24 GMT
Title: Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering
Authors: Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke,
Abstract summary: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources.<n>We propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper and Zipformer.
Score: 11.50314008820538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple selection strategies -- including word error rate (WER) prediction, named entity recognition (NER), and character error rate (CER) analysis -- to extract high-quality training segments. We evaluate our method on Whisper and Zipformer using a 7500-hour baseline, comparing it to a CER-based approach relying on hypotheses from three ASR systems. Fine-tuning on 7500 hours of pseudo-labeled call center data achieves 12.3% WER, while our filtering reduces the dataset to 100 hours (1.4%) with similar performance; a similar trend is observed on Fisher English.

Related papers

Multimodal Consistency-Guided Reference-Free Data Selection for ASR Accent Adaptation [0.05219568203653524]
We introduce a multimodal consistency-guided, reference-free data selection pipeline for ASR accent adaptation.<n>The pipeline scores each hypothesis using two reference-free signals: speech-text alignment in a shared embedding space and predicted word error rate.<n>A simple percentile-based selection rule retains reliable pseudo-labels for fine-tuning while discarding noisy utterances.
arXiv Detail & Related papers (2026-02-03T21:35:58Z)
Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering [11.50314008820538]
Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce.<n>We propose an incremental semi-supervised learning pipeline that integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain.
arXiv Detail & Related papers (2025-06-05T12:53:20Z)
Improving Model Evaluation using SMART Filtering of Benchmark Datasets [19.731378662304497]
We propose a novel approach to select high-quality subsets of examples from existing benchmark datasets.<n>Our approach applies three filtering criteria, removing (i) easy examples, (ii) data-contaminated examples, and (iii) examples that are similar to each other.<n>We demonstrate the effectiveness of SMART on three multiple choice QA datasets.
arXiv Detail & Related papers (2024-10-26T18:21:44Z)
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning.<n>We construct pseudo-skill clusters by grouping gradient-based sample vectors.<n>We select the best-performing data selector for each skill cluster from a pool of selector experts.<n>This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency. We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question. Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z)
Train/Test-Time Adaptation with Retrieval [129.8579208970529]
We introduce Train/Test-Time Adaptation with Retrieval ($rm T3AR$), a method to adapt models both at train and test time. $rm T3AR$ adapts a given model to the downstream task using refined pseudo-labels and a self-supervised contrastive objective function. Thanks to the retrieval module, our method gives the user or service provider the possibility to improve model adaptation on the downstream task.
arXiv Detail & Related papers (2023-03-25T02:44:57Z)
Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition [6.506420603456938]
We propose a data selection strategy named LM Filter to improve the performances of NST. We can achieve 3.31% CER in AISHELL-1 test set, which is best result from our knowledge without any other supervised data. We also perform evaluations on supervised 1000 hour AISHELL-2 dataset and competitive results of 4.72% CER can be achieved.
arXiv Detail & Related papers (2022-11-09T07:23:15Z)
Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples. Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z)
Filter-enhanced MLP is All You Need for Sequential Recommendation [89.0974365344997]
In online platforms, logged user behavior data is inevitable to contain noise. We borrow the idea of filtering algorithms from signal processing that attenuates the noise in the frequency domain. We propose textbfFMLP-Rec, an all-MLP model with learnable filters for sequential recommendation task.
arXiv Detail & Related papers (2022-02-28T05:49:35Z)
GOLD: Improving Out-of-Scope Detection in Dialogues using Data Augmentation [41.04593978694591]
Gold technique augments existing data to train better OOS detectors operating in low-data regimes. In experiments across three target benchmarks, the top GOLD model outperforms all existing methods on all key metrics.
arXiv Detail & Related papers (2021-09-07T13:35:03Z)
Unsupervised and self-adaptative techniques for cross-domain person re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task. Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation. In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z)
A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees [17.152864798265455]
We propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. We demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy.
arXiv Detail & Related papers (2020-11-06T14:17:41Z)
Collaborative Training between Region Proposal Localization and Classification for Domain Adaptive Object Detection [121.28769542994664]
Domain adaptation for object detection tries to adapt the detector from labeled datasets to unlabeled ones for better performance. In this paper, we are the first to reveal that the region proposal network (RPN) and region proposal classifier(RPC) demonstrate significantly different transferability when facing large domain gap.
arXiv Detail & Related papers (2020-09-17T07:39:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.