Adversarial Permutation Invariant Training for Universal Sound
Separation
- URL: http://arxiv.org/abs/2210.12108v1
- Date: Fri, 21 Oct 2022 17:04:17 GMT
- Title: Adversarial Permutation Invariant Training for Universal Sound
Separation
- Authors: Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serr\`a
- Abstract summary: In this work, we complement permutation invariant training (PIT) with adversarial losses but find it challenging with the standard formulation used in speech source separation.
We overcome this challenge with a novel I-replacement context-based adversarial loss, and by training with multiple discriminators.
Our experiments show that by simply improving the loss (keeping the same model and dataset) we obtain a non-negligible improvement of 1.4 dB SI-SNRi in the reverberant FUSS dataset.
- Score: 23.262892768718824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Universal sound separation consists of separating mixes with arbitrary sounds
of different types, and permutation invariant training (PIT) is used to train
source agnostic models that do so. In this work, we complement PIT with
adversarial losses but find it challenging with the standard formulation used
in speech source separation. We overcome this challenge with a novel
I-replacement context-based adversarial loss, and by training with multiple
discriminators. Our experiments show that by simply improving the loss (keeping
the same model and dataset) we obtain a non-negligible improvement of 1.4 dB
SI-SNRi in the reverberant FUSS dataset. We also find adversarial PIT to be
effective at reducing spectral holes, ubiquitous in mask-based separation
models, which highlights the potential relevance of adversarial losses for
source separation.
Related papers
- Single-channel speech enhancement using learnable loss mixup [23.434378634735676]
Generalization remains a major problem in supervised learning of single-channel speech enhancement.
We propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models.
Our experimental results on the VCTK benchmark show that learnable loss mixup 3.26 PESQ, achieves outperforming the state-of-the-art.
arXiv Detail & Related papers (2023-12-20T00:25:55Z) - Meta-Causal Feature Learning for Out-of-Distribution Generalization [71.38239243414091]
This paper presents a balanced meta-causal learner (BMCL), which includes a balanced task generation module (BTG) and a meta-causal feature learning module (MCFL)
BMCL effectively identifies the class-invariant visual regions for classification and may serve as a general framework to improve the performance of the state-of-the-art methods.
arXiv Detail & Related papers (2022-08-22T09:07:02Z) - Heterogeneous Target Speech Separation [52.05046029743995]
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts.
Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts.
arXiv Detail & Related papers (2022-04-07T17:14:20Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - On permutation invariant training for speech source separation [20.82852423999727]
We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models.
First, we look at the two-stage speaker separation and tracking algorithm based on frame level PIT (tPIT) and clustering, which was originally proposed for the STFT domain.
Second, we extend a recently proposed auxiliary speaker-ID loss with a deep feature loss based on "problem agnostic speech features", to reduce the local permutation errors made by the utterance level PIT (uPIT)
arXiv Detail & Related papers (2021-02-09T16:57:32Z) - Unsupervised Sound Separation Using Mixture Invariant Training [38.0680944898427]
We show that MixIT can achieve competitive performance compared to supervised methods on speech separation.
In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures.
arXiv Detail & Related papers (2020-06-23T02:22:14Z) - Adaptive Adversarial Logits Pairing [65.51670200266913]
An adversarial training solution Adversarial Logits Pairing (ALP) tends to rely on fewer high-contribution features compared with vulnerable ones.
Motivated by these observations, we design an Adaptive Adversarial Logits Pairing (AALP) solution by modifying the training process and training target of ALP.
AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features.
arXiv Detail & Related papers (2020-05-25T03:12:20Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.