Stabilizing Label Assignment for Speech Separation by Self-supervised
Pre-training
- URL: http://arxiv.org/abs/2010.15366v3
- Date: Sun, 22 Aug 2021 06:26:39 GMT
- Title: Stabilizing Label Assignment for Speech Separation by Self-supervised
Pre-training
- Authors: Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping
Yang, Hung-yi Lee
- Abstract summary: We propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model.
Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
- Score: 58.30339239234169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech separation has been well developed, with the very successful
permutation invariant training (PIT) approach, although the frequent label
assignment switching happening during PIT training remains to be a problem when
better convergence speed and achievable performance are desired. In this paper,
we propose to perform self-supervised pre-training to stabilize the label
assignment in training the speech separation model. Experiments over several
types of self-supervised approaches, several typical speech separation models
and two different datasets showed that very good improvements are achievable if
a proper self-supervised approach is chosen.
Related papers
- Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens [31.568675300434816]
Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset.
During inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one.
This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time.
arXiv Detail & Related papers (2024-10-18T17:48:27Z) - Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach [4.511384690621755]
Self-Training for Landmark Detection (STLD) is a method that does not require explicit pseudo-label selection.
STLD constructs a task curriculum to deal with confirmation bias.
Experiments on three facial and one medical landmark detection benchmark show that STLD outperforms the existing methods.
arXiv Detail & Related papers (2024-04-06T08:45:07Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Uncertainty-aware Sampling for Long-tailed Semi-supervised Learning [89.98353600316285]
We introduce uncertainty into the modeling process for pseudo-label sampling, taking into account that the model performance on the tailed classes varies over different training stages.
This approach allows the model to perceive the uncertainty of pseudo-labels at different training stages, thereby adaptively adjusting the selection thresholds for different classes.
Compared to other methods such as the baseline method FixMatch, UDTS achieves an increase in accuracy of at least approximately 5.26%, 1.75%, 9.96%, and 1.28% on the natural scene image datasets.
arXiv Detail & Related papers (2024-01-09T08:59:39Z) - Improving Label Assignments Learning by Dynamic Sample Dropout Combined
with Layer-wise Optimization in Speech Separation [8.489574755691613]
In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model.
Previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments.
We propose a novel training strategy, dynamic sample dropout (DSD), which considers previous best label assignments and evaluation metrics to exclude the samples that may negatively impact the learned label assignments during training.
arXiv Detail & Related papers (2023-11-20T21:37:38Z) - Supervision-Guided Codebooks for Masked Prediction in Speech
Pre-training [102.14558233502514]
Masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.
We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2022-06-21T06:08:30Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - Two-phase Pseudo Label Densification for Self-training based Domain
Adaptation [93.03265290594278]
We propose a novel Two-phase Pseudo Label Densification framework, referred to as TPLD.
In the first phase, we use sliding window voting to propagate the confident predictions, utilizing intrinsic spatial-correlations in the images.
In the second phase, we perform a confidence-based easy-hard classification.
To ease the training process and avoid noisy predictions, we introduce the bootstrapping mechanism to the original self-training loss.
arXiv Detail & Related papers (2020-12-09T02:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.