Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
Source Separation
- URL: http://arxiv.org/abs/2301.10752v2
- Date: Sat, 24 Jun 2023 05:28:19 GMT
- Title: Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
Source Separation
- Authors: Shahar Lutati and Eliya Nachmani and Lior Wolf
- Abstract summary: We show how the upper bound can be generalized to the case of random generative models.
We show state-of-the-art results on 2, 3, 5, 10, and 20 speakers on multiple benchmarks.
- Score: 99.19786288094596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of speech separation, also known as the cocktail party problem,
refers to the task of isolating a single speech signal from a mixture of speech
signals. Previous work on source separation derived an upper bound for the
source separation task in the domain of human speech. This bound is derived for
deterministic models. Recent advancements in generative models challenge this
bound. We show how the upper bound can be generalized to the case of random
generative models. Applying a diffusion model Vocoder that was pretrained to
model single-speaker voices on the output of a deterministic separation model
leads to state-of-the-art separation results. It is shown that this requires
one to combine the output of the separation model with that of the diffusion
model. In our method, a linear combination is performed, in the frequency
domain, using weights that are inferred by a learned model. We show
state-of-the-art results on 2, 3, 5, 10, and 20 speakers on multiple
benchmarks. In particular, for two speakers, our method is able to surpass what
was previously considered the upper performance bound.
Related papers
- Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - Monaural Multi-Speaker Speech Separation Using Efficient Transformer
Model [0.0]
"Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms.
The model has been trained with the LibriMix dataset containing diverse speakers' utterances.
arXiv Detail & Related papers (2023-07-29T15:10:46Z) - UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion
Model [1.0874597293913013]
UnDiff is a diffusion probabilistic model capable of solving various speech inverse tasks.
It can be adapted to different tasks including inversion degradation, neural vocoding, and source separation.
arXiv Detail & Related papers (2023-06-01T14:22:55Z) - An Efficient Membership Inference Attack for the Diffusion Model by
Proximal Initialization [58.88327181933151]
In this paper, we propose an efficient query-based membership inference attack (MIA)
Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models.
To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the text-to-speech task.
arXiv Detail & Related papers (2023-05-26T16:38:48Z) - Multi-Source Diffusion Models for Simultaneous Music Generation and Separation [17.124189082882395]
We train our model on Slakh2100, a standard dataset for musical source separation.
Our method is the first example of a single model that can handle both generation and separation tasks.
arXiv Detail & Related papers (2023-02-04T23:18:36Z) - OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x.
We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z) - Speech Prediction in Silent Videos using Variational Autoencoders [29.423462898526605]
We present a model for generating speech in a silent video.
The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory's conditional distribution.
We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.
arXiv Detail & Related papers (2020-11-14T17:09:03Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.