Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised
Speech Models
- URL: http://arxiv.org/abs/2212.01661v1
- Date: Sat, 3 Dec 2022 18:05:08 GMT
- Title: Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised
Speech Models
- Authors: Reem Gody and David Harwath
- Abstract summary: Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models.
Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget.
- Score: 13.956691231452336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) has been able to leverage unlabeled data to
boost the performance of automatic speech recognition (ASR) models when we have
access to only a small amount of transcribed speech data. However, this raises
the question of which subset of the available unlabeled data should be selected
for transcription. Our work investigates different unsupervised data selection
techniques for fine-tuning the HuBERT model under a limited transcription
budget. We investigate the impact of speaker diversity, gender bias, and topic
diversity on the downstream ASR performance. We also devise two novel
techniques for unsupervised data selection: pre-training loss based data
selection and the perplexity of byte pair encoded clustered units (PBPE) and we
show how these techniques compare to pure random data selection. Finally, we
analyze the correlations between the inherent characteristics of the selected
fine-tuning subsets as well as how these characteristics correlate with the
resultant word error rate. We demonstrate the importance of token diversity,
speaker diversity, and topic diversity in achieving the best performance in
terms of WER.
Related papers
- Speech Corpora Divergence Based Unsupervised Data Selection for ASR [30.224456184969693]
This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD)
Experiments show that the proposed SCD data selection can realize 14.8% relative improvements to the random selection.
arXiv Detail & Related papers (2023-02-26T03:26:26Z) - Learning to Detect Noisy Labels Using Model-Based Features [16.681748918518075]
We propose Selection-Enhanced Noisy label Training (SENT)
SENT does not rely on meta learning while having the flexibility of being data-driven.
It improves performance over strong baselines under the settings of self-training and label corruption.
arXiv Detail & Related papers (2022-12-28T10:12:13Z) - Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision.
Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Unsupervised neural adaptation model based on optimal transport for
spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded.
We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z) - Knowledge Distillation and Data Selection for Semi-Supervised Learning
in CTC Acoustic Models [9.496916045581736]
Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabelled data in order to improve the accuracy of speech recognition systems.
Our aim is to establish the importance of good criteria in selecting samples from a large pool of unlabelled data.
We perform empirical investigations of different data selection methods to answer this question and quantify the effect of different sampling strategies.
arXiv Detail & Related papers (2020-08-10T07:00:08Z) - Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
Using Discrete Speech Representation [125.59372403631006]
We propose a semi-supervised learning approach for multi-speaker text-to-speech (TTS)
A multi-speaker TTS model can learn from the untranscribed audio via the proposed encoder-decoder framework with discrete speech representation.
We found the model can benefit from the proposed semi-supervised learning approach even when part of the unpaired speech data is noisy.
arXiv Detail & Related papers (2020-05-16T15:47:11Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.