Unsupervised vocal dereverberation with diffusion-based generative
models
- URL: http://arxiv.org/abs/2211.04124v1
- Date: Tue, 8 Nov 2022 09:43:01 GMT
- Title: Unsupervised vocal dereverberation with diffusion-based generative
models
- Authors: Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta
Takida, Takao Fukui, Yuki Mitsufuji
- Abstract summary: We propose an unsupervised method to remove a general kind of artificial reverb for music without requiring pairs of data for training.
We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.
- Score: 12.713895991763867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Removing reverb from reverberant music is a necessary technique to clean up
audio for downstream music manipulations. Reverberation of music contains two
categories, natural reverb, and artificial reverb. Artificial reverb has a
wider diversity than natural reverb due to its various parameter setups and
reverberation types. However, recent supervised dereverberation methods may
fail because they rely on sufficiently diverse and numerous pairs of
reverberant observations and retrieved data for training in order to be
generalizable to unseen observations during inference. To resolve these
problems, we propose an unsupervised method that can remove a general kind of
artificial reverb for music without requiring pairs of data for training. The
proposed method is based on diffusion models, where it initializes the unknown
reverberation operator with a conventional signal processing technique and
simultaneously refines the estimate with the help of diffusion models. We show
through objective and perceptual evaluations that our method outperforms the
current leading vocal dereverberation benchmarks.
Related papers
- Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion [93.32354378820648]
We introduce MVSD, a mutual learning framework based on diffusion models.
MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks.
Our framework can improve the performance of the reverberator and dereverberator.
arXiv Detail & Related papers (2024-07-15T00:47:56Z) - BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models [21.66936362048033]
We present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation.
We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined.
arXiv Detail & Related papers (2024-05-07T12:41:31Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Unsupervised speech enhancement with diffusion-based generative models [0.0]
We introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
We develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference.
We show promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method.
arXiv Detail & Related papers (2023-09-19T09:11:31Z) - Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement.
In this paper, we address these limitations through a two-stage training approach.
We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z) - UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion
Model [1.0874597293913013]
UnDiff is a diffusion probabilistic model capable of solving various speech inverse tasks.
It can be adapted to different tasks including inversion degradation, neural vocoding, and source separation.
arXiv Detail & Related papers (2023-06-01T14:22:55Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly
Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm.
We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models.
The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.