Continual self-training with bootstrapped remixing for speech
enhancement
- URL: http://arxiv.org/abs/2110.10103v1
- Date: Tue, 19 Oct 2021 16:56:18 GMT
- Title: Continual self-training with bootstrapped remixing for speech
enhancement
- Authors: Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar
- Abstract summary: RemixIT is a simple and novel self-supervised training method for speech enhancement.
Our experiments show that RemixIT outperforms several previous state-of-the-art self-supervised methods.
- Score: 32.68203972471562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose RemixIT, a simple and novel self-supervised training method for
speech enhancement. The proposed method is based on a continuously
self-training scheme that overcomes limitations from previous studies including
assumptions for the in-domain noise distribution and having access to clean
target signals. Specifically, a separation teacher model is pre-trained on an
out-of-domain dataset and is used to infer estimated target signals for a batch
of in-domain mixtures. Next, we bootstrap the mixing process by generating
artificial mixtures using permuted estimated clean and noise signals. Finally,
the student model is trained using the permuted estimated sources as targets
while we periodically update teacher's weights using the latest student model.
Our experiments show that RemixIT outperforms several previous state-of-the-art
self-supervised methods under multiple speech enhancement tasks. Additionally,
RemixIT provides a seamless alternative for semi-supervised and unsupervised
domain adaptation for speech enhancement tasks, while being general enough to
be applied to any separation task and paired with any separation model.
Related papers
- One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Diffusion-based speech enhancement with a weighted generative-supervised
learning loss [0.0]
Diffusion-based generative models have recently gained attention in speech enhancement (SE)
We propose augmenting the original diffusion training objective with a mean squared error (MSE) loss, measuring the discrepancy between estimated enhanced speech and ground-truth clean speech.
arXiv Detail & Related papers (2023-09-19T09:13:35Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Self-Adapting Noise-Contrastive Estimation for Energy-Based Models [0.0]
Training energy-based models with noise-contrastive estimation (NCE) is theoretically feasible but practically challenging.
Previous works have explored modelling the noise distribution as a separate generative model, and then concurrently training this noise model with the EBM.
This thesis proposes a self-adapting NCE algorithm which uses static instances of the EBM along its training trajectory as the noise distribution.
arXiv Detail & Related papers (2022-11-03T15:17:43Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - RemixIT: Continual self-training of speech enhancement models via
bootstrapped remixing [41.77753005397551]
RemixIT is a selfsupervised method for training speech enhancement without the need of a single isolated in-domain speech or a noise waveform.
We show that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task.
arXiv Detail & Related papers (2022-02-17T19:07:29Z) - Teacher-Student MixIT for Unsupervised and Semi-supervised Speech
Separation [27.19635746008699]
We introduce a novel semi-supervised learning framework for end-to-end speech separation.
The proposed method first uses mixtures of unseparated sources and the mixture invariant training criterion to train a teacher model.
Experiments with single and multi channel mixtures show that the teacher-student training resolves the over-separation problem.
arXiv Detail & Related papers (2021-06-15T02:26:42Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.