Optimal Condition Training for Target Source Separation
- URL: http://arxiv.org/abs/2211.05927v1
- Date: Fri, 11 Nov 2022 00:04:55 GMT
- Title: Optimal Condition Training for Target Source Separation
- Authors: Efthymios Tzinis, Gordon Wichern, Paris Smaragdis and Jonathan Le Roux
- Abstract summary: We propose a new optimal condition training method for single-channel target source separation.
We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
- Score: 56.86138859538063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent research has shown remarkable performance in leveraging multiple
extraneous conditional and non-mutually exclusive semantic concepts for sound
source separation, allowing the flexibility to extract a given target source
based on multiple different queries. In this work, we propose a new optimal
condition training (OCT) method for single-channel target source separation,
based on greedy parameter updates using the highest performing condition among
equivalent conditions associated with a given target source. Our experiments
show that the complementary information carried by the diverse semantic
concepts significantly helps to disentangle and isolate sources of interest
much more efficiently compared to single-conditioned models. Moreover, we
propose a variation of OCT with condition refinement, in which an initial
conditional vector is adapted to the given mixture and transformed to a more
amenable representation for target source extraction. We showcase the
effectiveness of OCT on diverse source separation experiments where it improves
upon permutation invariant models with oracle assignment and obtains
state-of-the-art performance in the more challenging task of text-based source
separation, outperforming even dedicated text-only conditioned models.
Related papers
- Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - Complete and separate: Conditional separation with missing target source
attribute completion [27.215800308343322]
We present an approach in which a model, given an input mixture and partial semantic information about a target source, is trained to extract additional semantic data.
We then leverage this pre-trained model to improve the separation performance of an uncoupled multi-conditional separation network.
arXiv Detail & Related papers (2023-07-27T03:53:53Z) - Towards Estimating Transferability using Hard Subsets [25.86053764521497]
We propose HASTE, a new strategy to estimate the transferability of a source model to a particular target task using only a harder subset of target data.
We show that HASTE can be used with any existing transferability metric to improve their reliability.
Our experimental results across multiple source model architectures, target datasets, and transfer learning tasks show that HASTE modified metrics are consistently better or on par with the state of the art transferability metrics.
arXiv Detail & Related papers (2023-01-17T14:50:18Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - Heterogeneous Target Speech Separation [52.05046029743995]
We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts.
Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts.
arXiv Detail & Related papers (2022-04-07T17:14:20Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Dizygotic Conditional Variational AutoEncoder for Multi-Modal and
Partial Modality Absent Few-Shot Learning [19.854565192491123]
We present a novel multi-modal data augmentation approach named Dizygotic Conditional Variational AutoEncoder (DCVAE)
DCVAE conducts feature synthesis via pairing two Conditional Variational AutoEncoders (CVAEs) with the same seed but different modality conditions in a dizygotic symbiosis manner.
The generated features of two CVAEs are adaptively combined to yield the final feature, which can be converted back into its paired conditions.
arXiv Detail & Related papers (2021-06-28T08:29:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.