Teacher-Student MixIT for Unsupervised and Semi-supervised Speech
Separation
- URL: http://arxiv.org/abs/2106.07843v2
- Date: Wed, 16 Jun 2021 08:25:29 GMT
- Title: Teacher-Student MixIT for Unsupervised and Semi-supervised Speech
Separation
- Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker
- Abstract summary: We introduce a novel semi-supervised learning framework for end-to-end speech separation.
The proposed method first uses mixtures of unseparated sources and the mixture invariant training criterion to train a teacher model.
Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem.
- Score: 27.19635746008699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel semi-supervised learning framework for
end-to-end speech separation. The proposed method first uses mixtures of
unseparated sources and the mixture invariant training (MixIT) criterion to
train a teacher model. The teacher model then estimates separated sources that
are used to train a student model with standard permutation invariant training
(PIT). The student model can be fine-tuned with supervised data, i.e., paired
artificial mixtures and clean speech sources, and further improved via model
distillation. Experiments with single and multi channel mixtures show that the
teacher-student training resolves the over-separation problem observed in the
original MixIT method. Further, the semisupervised performance is comparable to
a fully-supervised separation system trained using ten times the amount of
supervised data.
Related papers
- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Twice Class Bias Correction for Imbalanced Semi-Supervised Learning [59.90429949214134]
We introduce a novel approach called textbfTwice textbfClass textbfBias textbfCorrection (textbfTCBC)
We estimate the class bias of the model parameters during the training process.
We apply a secondary correction to the model's pseudo-labels for unlabeled samples.
arXiv Detail & Related papers (2023-12-27T15:06:36Z) - Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation [62.021828104757745]
We propose AD-MT, an alternate diverse teaching approach in a teacher-student framework.
It involves a single student model and two non-trainable teacher models that are momentum-updated periodically and randomly in an alternate fashion.
arXiv Detail & Related papers (2023-11-29T02:44:54Z) - RemixIT: Continual self-training of speech enhancement models via
bootstrapped remixing [41.77753005397551]
RemixIT is a selfsupervised method for training speech enhancement without the need of a single isolated in-domain speech or a noise waveform.
We show that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task.
arXiv Detail & Related papers (2022-02-17T19:07:29Z) - Unsupervised Source Separation via Self-Supervised Training [0.913755431537592]
We introduce two novel unsupervised (blind) source separation methods, which involve self-supervised training from single-channel two-source speech mixtures.
Our first method employs permutation invariant training (PIT) to separate artificially-generated mixtures back into the original mixtures.
We improve upon this first method by creating mixtures of source estimates and employing PIT to separate these new mixtures in a cyclic fashion.
We show that MixPIT outperforms a common baseline (MixIT) on our small dataset (SC09Mix), and they have comparable performance on a standard dataset (LibriMix)
arXiv Detail & Related papers (2022-02-08T14:02:50Z) - Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models [8.80867379881193]
We propose an unsupervised model-based deep learning approach to musical source separation.
A neural network is trained to reconstruct the observed mixture as a sum of the sources.
The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
arXiv Detail & Related papers (2022-01-24T11:05:30Z) - Boosting Discriminative Visual Representation Learning with
Scenario-Agnostic Mixup [54.09898347820941]
We propose textbfScenario-textbfAgnostic textbfMixup (SAMix) for both Self-supervised Learning (SSL) and supervised learning (SL) scenarios.
Specifically, we hypothesize and verify the objective function of mixup generation as optimizing local smoothness between two mixed classes.
A label-free generation sub-network is designed, which effectively provides non-trivial mixup samples and improves transferable abilities.
arXiv Detail & Related papers (2021-11-30T14:49:59Z) - Continual self-training with bootstrapped remixing for speech
enhancement [32.68203972471562]
RemixIT is a simple and novel self-supervised training method for speech enhancement.
Our experiments show that RemixIT outperforms several previous state-of-the-art self-supervised methods.
arXiv Detail & Related papers (2021-10-19T16:56:18Z) - ReMix: Towards Image-to-Image Translation with Limited Data [154.71724970593036]
We propose a data augmentation method (ReMix) to tackle this issue.
We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.
The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results.
arXiv Detail & Related papers (2021-03-31T06:24:10Z) - Unsupervised Sound Separation Using Mixture Invariant Training [38.0680944898427]
We show that MixIT can achieve competitive performance compared to supervised methods on speech separation.
In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures.
arXiv Detail & Related papers (2020-06-23T02:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.