Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
- URL: http://arxiv.org/abs/2406.02566v1
- Date: Fri, 3 May 2024 19:24:41 GMT
- Title: Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
- Authors: Ognjen Kundacina, Vladimir Vincan, Dragisa Miskovic,
- Abstract summary: This paper introduces a novel two-stage active learning pipeline for automatic speech recognition (ASR)
The first stage utilizes unsupervised AL by using x-vectors clustering for diverse sample selection from unlabeled speech data.
The second stage incorporates a supervised AL strategy, with a batch AL method specifically developed for ASR.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emphasizing a data-centric AI approach, this paper introduces a novel two-stage active learning (AL) pipeline for automatic speech recognition (ASR), combining unsupervised and supervised AL methods. The first stage utilizes unsupervised AL by using x-vectors clustering for diverse sample selection from unlabeled speech data, thus establishing a robust initial dataset for the subsequent supervised AL. The second stage incorporates a supervised AL strategy, with a batch AL method specifically developed for ASR, aimed at selecting diverse and informative batches of samples. Here, sample diversity is also achieved using x-vectors clustering, while the most informative samples are identified using a Bayesian AL method tailored for ASR with an adaptation of Monte Carlo dropout to approximate Bayesian inference. This approach enables precise uncertainty estimation, thereby enhancing ASR model training with significantly reduced data requirements. Our method has shown superior performance compared to competing methods on homogeneous, heterogeneous, and OOD test sets, demonstrating that strategic sample selection and innovative Bayesian modeling can substantially optimize both labeling effort and data utilization in deep learning-based ASR applications.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Anchor-aware Deep Metric Learning for Audio-visual Retrieval [11.675472891647255]
Metric learning aims at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR)
Recent works employ sampling methods to select impactful data points from the embedding space during training.
However, the model training fails to fully explore the space due to the scarcity of training data points.
We propose an innovative Anchor-aware Deep Metric Learning (AADML) method to address this challenge.
arXiv Detail & Related papers (2024-04-21T22:44:44Z) - Mean-AP Guided Reinforced Active Learning for Object Detection [31.304039641225504]
This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL)
MGRAL is a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks.
Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.
arXiv Detail & Related papers (2023-10-12T14:59:22Z) - A Global Model Approach to Robust Few-Shot SAR Automatic Target
Recognition [6.260916845720537]
It may not always be possible to collect hundreds of labeled samples per class for training deep learning-based SAR Automatic Target Recognition (ATR) models.
This work specifically tackles the few-shot SAR ATR problem, where only a handful of labeled samples may be available to support the task of interest.
arXiv Detail & Related papers (2023-03-20T00:24:05Z) - Pareto Optimization for Active Learning under Out-of-Distribution Data
Scenarios [79.02009938011447]
We propose a sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool.
Experimental results show its effectiveness on both classical Machine Learning (ML) and Deep Learning (DL) tasks.
arXiv Detail & Related papers (2022-07-04T04:11:44Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Improving speech recognition models with small samples for air traffic
control systems [9.322392779428505]
In this work, a novel training approach based on pretraining and transfer learning is proposed to address the issue of small training samples.
Three real ATC datasets are used to validate the proposed ASR model and training strategies.
The experimental results demonstrate that the ASR performance is significantly improved on all three datasets.
arXiv Detail & Related papers (2021-02-16T08:28:52Z) - A Mathematical Programming approach to Binary Supervised Classification
with Label Noise [1.2031796234206138]
We propose novel methodologies to construct Support Vector Machine -based classifiers.
The first method incorporates relabeling directly in the SVM model.
A second family of methods combines clustering with classification at the same time, giving rise to a model that applies simultaneously similarity measures and SVM.
arXiv Detail & Related papers (2020-04-21T17:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.