Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models
- URL: http://arxiv.org/abs/2201.09592v1
- Date: Mon, 24 Jan 2022 11:05:30 GMT
- Title: Unsupervised Audio Source Separation Using Differentiable Parametric
Source Models
- Authors: Kilian Schulze-Forster, Clement S. J. Doire, Ga\"el Richard, Roland
Badeau
- Abstract summary: We propose an unsupervised model-based deep learning approach to musical source separation.
A neural network is trained to reconstruct the observed mixture as a sum of the sources.
The experimental evaluation on a vocal ensemble separation task shows that the proposed method outperforms learning-free methods.
- Score: 8.80867379881193
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised deep learning approaches to underdetermined audio source
separation achieve state-of-the-art performance but require a dataset of
mixtures along with their corresponding isolated source signals. Such datasets
can be extremely costly to obtain for musical mixtures. This raises a need for
unsupervised methods. We propose a novel unsupervised model-based deep learning
approach to musical source separation. Each source is modelled with a
differentiable parametric source-filter model. A neural network is trained to
reconstruct the observed mixture as a sum of the sources by estimating the
source models' parameters given their fundamental frequencies. At test time,
soft masks are obtained from the synthesized source signals. The experimental
evaluation on a vocal ensemble separation task shows that the proposed method
outperforms learning-free methods based on nonnegative matrix factorization and
a supervised deep learning baseline. Integrating domain knowledge in the form
of source models into a data-driven method leads to high data efficiency: the
proposed approach achieves good separation quality even when trained on less
than three minutes of audio. This work makes powerful deep learning based
separation usable in scenarios where training data with ground truth is
expensive or nonexistent.
Related papers
- Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
Source Separation [99.19786288094596]
We show how the upper bound can be generalized to the case of random generative models.
We show state-of-the-art results on 2, 3, 5, 10, and 20 speakers on multiple benchmarks.
arXiv Detail & Related papers (2023-01-25T18:21:51Z) - Deep Active Learning with Noise Stability [24.54974925491753]
Uncertainty estimation for unlabeled data is crucial to active learning.
We propose a novel algorithm that leverages noise stability to estimate data uncertainty.
Our method is generally applicable in various tasks, including computer vision, natural language processing, and structural data analysis.
arXiv Detail & Related papers (2022-05-26T13:21:01Z) - Zero-shot Audio Source Separation through Query-based Learning from
Weakly-labeled Data [26.058278155958668]
We propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet.
Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.
The proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training.
arXiv Detail & Related papers (2021-12-15T05:13:43Z) - Learning Dynamics from Noisy Measurements using Deep Learning with a
Runge-Kutta Constraint [9.36739413306697]
We discuss a methodology to learn differential equation(s) using noisy and sparsely sampled measurements.
In our methodology, the main innovation can be seen in of integration of deep neural networks with a classical numerical integration method.
arXiv Detail & Related papers (2021-09-23T15:43:45Z) - A Review of Sound Source Localization with Deep Learning Methods [71.18444724397486]
This article is a review on deep learning methods for single and multiple sound source localization.
We provide an exhaustive topography of the neural-based localization literature in this context.
Tables summarizing the literature review are provided at the end of the review for a quick search of methods with a given set of target characteristics.
arXiv Detail & Related papers (2021-09-08T07:25:39Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Unsupervised Sound Separation Using Mixture Invariant Training [38.0680944898427]
We show that MixIT can achieve competitive performance compared to supervised methods on speech separation.
In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures.
arXiv Detail & Related papers (2020-06-23T02:22:14Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.