Improving Noisy Student Training on Non-target Domain Data for Automatic
Speech Recognition
- URL: http://arxiv.org/abs/2211.04717v1
- Date: Wed, 9 Nov 2022 07:23:15 GMT
- Title: Improving Noisy Student Training on Non-target Domain Data for Automatic
Speech Recognition
- Authors: Yu Chen, Wen Ding, Junjie Lai
- Abstract summary: We propose a data selection strategy named LM Filter to improve the performances of NST.
We can achieve 3.31% CER in AISHELL-1 test set, which is best result from our knowledge without any other supervised data.
We also perform evaluations on supervised 1000 hour AISHELL-2 dataset and competitive results of 4.72% CER can be achieved.
- Score: 6.506420603456938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Noisy Student Training (NST) has recently demonstrated extremely strong
performance in Automatic Speech Recognition (ASR). In this paper, we propose a
data selection strategy named LM Filter to improve the performances of NST on
non-target domain data in ASR tasks. Hypothesis with and without Language Model
are generated and CER differences between them are utilized as a filter
threshold. Results reveal that significant improvements of 10.4% compared with
no data filtering baselines. We can achieve 3.31% CER in AISHELL-1 test set,
which is best result from our knowledge without any other supervised data. We
also perform evaluations on supervised 1000 hour AISHELL-2 dataset and
competitive results of 4.72% CER can be achieved.
Related papers
- Enhancing Automated Essay Scoring with Three Techniques: Two-Stage Fine-Tuning, Score Alignment, and Self-Training [3.800498098285221]
This study proposes a novel approach to enhance AES performance in both limited-data and full-data settings.<n>We introduce a Two-Stage fine-tuning strategy that leverages low-rank adaptations to better adapt an AES model to target prompt essays.<n>Second, we introduce a Score Alignment technique to improve consistency between predicted and true score distributions.<n>Third, we employ uncertainty-aware self-training using unlabeled data, effectively expanding the training set with pseudo-labeled samples.
arXiv Detail & Related papers (2026-02-02T07:29:15Z) - Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering [11.50314008820538]
Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources.<n>We propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper and Zipformer.
arXiv Detail & Related papers (2025-06-04T08:11:24Z) - RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection [29.459431336830267]
We propose a gradient-free method that quantifies the fine-grained contribution of individual samples to both task-level and global-level model performance.<n>We introduce a lightweight selection paradigm trained on RICo scores, enabling scalable data selection with a strictly linear inference complexity.
arXiv Detail & Related papers (2025-05-08T15:17:37Z) - Improving Model Evaluation using SMART Filtering of Benchmark Datasets [19.731378662304497]
We propose a novel approach to select high-quality subsets of examples from existing benchmark datasets.
Our approach applies three filtering criteria, removing (i) easy examples, (ii) data-contaminated examples, and (iii) examples that are similar to each other.
We demonstrate the effectiveness of SMART on three multiple choice QA datasets.
arXiv Detail & Related papers (2024-10-26T18:21:44Z) - Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [56.24431208419858]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution [5.1660803395535835]
Self-supervised learning (SSL) has shown stellar performance compared to traditional methods.
However, SSL-based ASA systems are faced with at least three data-related challenges.
These challenges include limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels.
arXiv Detail & Related papers (2024-04-11T09:06:49Z) - Your Vision-Language Model Itself Is a Strong Filter: Towards
High-Quality Instruction Tuning with Data Selection [59.11430077029321]
We introduce a novel dataset selection method, Self-Filter, for vision-language models (VLMs)
In the first stage, we devise a scoring network to evaluate the difficulty of training instructions, which is co-trained with the VLM.
In the second stage, we use the trained score net to measure the difficulty of each instruction, select the most challenging samples, and penalize similar samples to encourage diversity.
arXiv Detail & Related papers (2024-02-19T20:08:48Z) - Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and
Self-training of Neural Transducer [20.8850874806462]
This paper proposes a new approach to perform unsupervised fine-tuning and self-training using unlabeled speech data.
For the fine-tuning task, ASR models are trained using supervised data from Wall Street Journal (WSJ), Aurora-4 along with CHiME-4 real noisy data as unlabeled data.
For the self-training task, ASR models are trained using supervised data from Wall Street Journal (WSJ), Aurora-4 along with CHiME-4 real noisy data as unlabeled data.
arXiv Detail & Related papers (2022-07-29T15:14:03Z) - Boosting Facial Expression Recognition by A Semi-Supervised Progressive
Teacher [54.50747989860957]
We propose a semi-supervised learning algorithm named Progressive Teacher (PT) to utilize reliable FER datasets as well as large-scale unlabeled expression images for effective training.
Experiments on widely-used databases RAF-DB and FERPlus validate the effectiveness of our method, which achieves state-of-the-art performance with accuracy of 89.57% on RAF-DB.
arXiv Detail & Related papers (2022-05-28T07:47:53Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Improving RNN-T ASR Performance with Date-Time and Location Awareness [6.308539010172309]
We show that contextual information, when used individually, improves overall performance by as much as 3.48% relative to the baseline.
On specific domains, these contextual signals show improvements as high as 11.5%, without any significant degradation on others.
Our results indicate that with limited data to train the ASR model, contextual signals can improve the performance significantly.
arXiv Detail & Related papers (2021-06-11T05:57:30Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.