Related papers: Unsupervised vocal dereverberation with diffusion-based generative models

Unsupervised vocal dereverberation with diffusion-based generative models

URL: http://arxiv.org/abs/2211.04124v1
Date: Tue, 8 Nov 2022 09:43:01 GMT
Title: Unsupervised vocal dereverberation with diffusion-based generative models
Authors: Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui, Yuki Mitsufuji
Abstract summary: We propose an unsupervised method to remove a general kind of artificial reverb for music without requiring pairs of data for training. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.
Score: 12.713895991763867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

Related papers

U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model [12.192022160630165]
This paper explores the outcome of training state-ofthe-art dereverberation models with supervision settings ranging from weakly-supervised to fully unsupervised.<n>Most of the existing deep learning approaches typically require paired dry and reverberant data, which are difficult to obtain in practice.<n>We develop instead a sequential learning strategy motivated by a bayesian formulation of the dereverberation problem, wherein acoustic parameters and dry signals are estimated from reverberant inputs using deep neural networks.
arXiv Detail & Related papers (2025-07-17T12:26:18Z)
SEED: Speaker Embedding Enhancement Diffusion Model [27.198463567915386]
A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch.<n>We propose a diffusion-based method that takes speaker embeddings extracted from a pre-trained speaker recognition model and generates refined embeddings.<n>Our method can improve recognition accuracy by up to 19.6% over baseline models while retaining performance on conventional scenarios.
arXiv Detail & Related papers (2025-05-22T15:38:37Z)
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion [93.32354378820648]
We introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks. Our framework can improve the performance of the reverberator and dereverberator.
arXiv Detail & Related papers (2024-07-15T00:47:56Z)
BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models [21.66936362048033]
We present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined.
arXiv Detail & Related papers (2024-05-07T12:41:31Z)
Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z)
Unsupervised speech enhancement with diffusion-based generative models [0.0]
We introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models. We develop a posterior sampling methodology for speech enhancement by combining the learnt clean speech prior with a noise model for speech signal inference. We show promising results compared to a recent variational auto-encoder (VAE)-based unsupervised approach and a state-of-the-art diffusion-based supervised method.
arXiv Detail & Related papers (2023-09-19T09:11:31Z)
Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement. In this paper, we address these limitations through a two-stage training approach. We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z)
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model [1.0874597293913013]
UnDiff is a diffusion probabilistic model capable of solving various speech inverse tasks. It can be adapted to different tasks including inversion degradation, neural vocoding, and source separation.
arXiv Detail & Related papers (2023-06-01T14:22:55Z)
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm. We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models. The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z)
Speech Enhancement and Dereverberation with Diffusion-based Generative Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation. We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates. In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z)
A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals. The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
Automatic Recall Machines: Internal Replay, Continual Learning and the Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity. We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective. Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.