Speech Corpora Divergence Based Unsupervised Data Selection for ASR
- URL: http://arxiv.org/abs/2302.13222v1
- Date: Sun, 26 Feb 2023 03:26:26 GMT
- Title: Speech Corpora Divergence Based Unsupervised Data Selection for ASR
- Authors: Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan
- Abstract summary: This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD)
Experiments show that the proposed SCD data selection can realize 14.8% relative improvements to the random selection.
- Score: 30.224456184969693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selecting application scenarios matching data is important for the automatic
speech recognition (ASR) training, but it is difficult to measure the matching
degree of the training corpus. This study proposes a unsupervised target-aware
data selection method based on speech corpora divergence (SCD), which can
measure the similarity between two speech corpora. We first use the
self-supervised Hubert model to discretize the speech corpora into label
sequence and calculate the N-gram probability distribution. Then we calculate
the Kullback-Leibler divergence between the N-grams as the SCD. Finally, we can
choose the subset which has minimum SCD to the target corpus for annotation and
training. Compared to previous data selection method, the SCD data selection
method can focus on more acoustic details and guarantee the diversity of the
selected set. We evaluate our method on different accents from Common Voice.
Experiments show that the proposed SCD data selection can realize 14.8%
relative improvements to the random selection, comparable or even superior to
the result of supervised selection.
Related papers
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion [88.59397418187226]
We propose a novel unified open-vocabulary detection method called OV-DINO.
It is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.
We evaluate the performance of the proposed OV-DINO on popular open-vocabulary detection benchmarks.
arXiv Detail & Related papers (2024-07-10T17:05:49Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised
Speech Models [13.956691231452336]
Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models.
Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget.
arXiv Detail & Related papers (2022-12-03T18:05:08Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Knowledge Distillation and Data Selection for Semi-Supervised Learning
in CTC Acoustic Models [9.496916045581736]
Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabelled data in order to improve the accuracy of speech recognition systems.
Our aim is to establish the importance of good criteria in selecting samples from a large pool of unlabelled data.
We perform empirical investigations of different data selection methods to answer this question and quantify the effect of different sampling strategies.
arXiv Detail & Related papers (2020-08-10T07:00:08Z) - Active Learning for Sound Event Detection [18.750572243562576]
This paper proposes an active learning system for sound event detection (SED)
It aims at maximizing the accuracy of a learned SED model with limited annotation effort.
Remarkably, the required annotation effort can be greatly reduced on the dataset where target sound events are rare.
arXiv Detail & Related papers (2020-02-12T14:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.