Teacher-Student MixIT for Unsupervised and Semi-supervised Speech
Separation
- URL: http://arxiv.org/abs/2106.07843v2
- Date: Wed, 16 Jun 2021 08:25:29 GMT
- Title: Teacher-Student MixIT for Unsupervised and Semi-supervised Speech
Separation
- Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
- Abstract summary: We introduce a novel semi-supervised learning framework for end-to-end speech separation.
The proposed method first uses mixtures of unseparated sources and the mixture invariant training criterion to train a teacher model.
Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem.
- Score: 27.19635746008699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel semi-supervised learning framework for
end-to-end speech separation. The proposed method first uses mixtures of
unseparated sources and the mixture invariant training (MixIT) criterion to
train a teacher model. The teacher model then estimates separated sources that
are used to train a student model with standard permutation invariant training
(PIT). The student model can be fine-tuned with supervised data, i.e., paired
artificial mixtures and clean speech sources, and further improved via model
distillation. Experiments with single and multi channel mixtures show that the
teacher-student training resolves the over-separation problem observed in the
original MixIT method. Further, the semisupervised performance is comparable to
a fully-supervised separation system trained using ten times the amount of
supervised data.
Related papers
- Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions [3.347388046213879]
We propose Score-of-Mixture Training (SMT), a novel framework for training one-step generative models.
SMT estimates the score of mixture distributions between real and fake samples across multiple noise levels.
Our approach supports both training from scratch (SMT) and distillation using a pretrained diffusion model, which we call Score-of-Mixture Distillation (SMD)
arXiv Detail & Related papers (2025-02-13T18:57:20Z) - Mix from Failure: Confusion-Pairing Mixup for Long-Tailed Recognition [14.009773753739282]
Long-tailed image recognition is a problem considering a real-world class distribution rather than an artificial uniform.
In this paper, we tackle the problem from a different perspective to augment a training dataset to enhance the sample diversity of minority classes.
Our method, namely Confusion-Pairing Mixup (CP-Mix), estimates the confusion distribution of the model and handles the data deficiency problem.
arXiv Detail & Related papers (2024-11-12T08:08:31Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Twice Class Bias Correction for Imbalanced Semi-Supervised Learning [59.90429949214134]
We introduce a novel approach called textbfTwice textbfClass textbfBias textbfCorrection (textbfTCBC)
We estimate the class bias of the model parameters during the training process.
We apply a secondary correction to the model's pseudo-labels for unlabeled samples.
arXiv Detail & Related papers (2023-12-27T15:06:36Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.
It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.
While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation [62.021828104757745]
We propose AD-MT, an alternate diverse teaching approach in a teacher-student framework.
It involves a single student model and two non-trainable teacher models that are momentum-updated periodically and randomly in an alternate fashion.
arXiv Detail & Related papers (2023-11-29T02:44:54Z) - Unsupervised Source Separation via Self-Supervised Training [0.913755431537592]
We introduce two novel unsupervised (blind) source separation methods, which involve self-supervised training from single-channel two-source speech mixtures.
Our first method employs permutation invariant training (PIT) to separate artificially-generated mixtures back into the original mixtures.
We improve upon this first method by creating mixtures of source estimates and employing PIT to separate these new mixtures in a cyclic fashion.
We show that MixPIT outperforms a common baseline (MixIT) on our small dataset (SC09Mix), and they have comparable performance on a standard dataset (LibriMix)
arXiv Detail & Related papers (2022-02-08T14:02:50Z) - Boosting Discriminative Visual Representation Learning with
Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios.
Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes.
A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z) - Continual self-training with bootstrapped remixing for speech
enhancement [32.68203972471562]
RemixIT is a simple and novel self-supervised training method for speech enhancement.
Our experiments show that RemixIT outperforms several previous state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2021-10-19T16:56:18Z) - ReMix: Towards Image-to-Image Translation with Limited Data [154.71724970593036]
We propose a data augmentation method (ReMix) to tackle this issue.
We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.
The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results.
arXiv Detail & Related papers (2021-03-31T06:24:10Z) - Unsupervised Sound Separation Using Mixture Invariant Training [38.0680944898427]
We show that MixIT can achieve competitive performance compared to supervised methods on speech separation.
In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures.
arXiv Detail & Related papers (2020-06-23T02:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.