Posterior sampling algorithms for unsupervised speech enhancement with
recurrent variational autoencoder
- URL: http://arxiv.org/abs/2309.10439v1
- Date: Tue, 19 Sep 2023 08:59:32 GMT
- Title: Posterior sampling algorithms for unsupervised speech enhancement with
recurrent variational autoencoder
- Authors: Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH)
- Abstract summary: We address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE)
This approach offers promising generalization performance over the supervised counterpart.
We present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the unsupervised speech enhancement problem based
on recurrent variational autoencoder (RVAE). This approach offers promising
generalization performance over the supervised counterpart. Nevertheless, the
involved iterative variational expectation-maximization (VEM) process at test
time, which relies on a variational inference method, results in high
computational complexity. To tackle this issue, we present efficient sampling
techniques based on Langevin dynamics and Metropolis-Hasting algorithms,
adapted to the EM-based speech enhancement with RVAE. By directly sampling from
the intractable posterior distribution within the EM process, we circumvent the
intricacies of variational inference. We conduct a series of experiments,
comparing the proposed methods with VEM and a state-of-the-art supervised
speech enhancement approach based on diffusion models. The results reveal that
our sampling-based algorithms significantly outperform VEM, not only in terms
of computational efficiency but also in overall performance. Furthermore, when
compared to the supervised baseline, our methods showcase robust generalization
performance in mismatched test conditions.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise.
This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Fast and efficient speech enhancement with variational autoencoders [0.0]
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
arXiv Detail & Related papers (2022-11-02T09:52:13Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - Speech Enhancement with Score-Based Generative Models in the Complex
STFT Domain [18.090665052145653]
We propose a novel training task for speech enhancement using a complex-valued deep neural network.
We derive this training task within the formalism of differential equations, thereby enabling the use of predictor-corrector samplers.
arXiv Detail & Related papers (2022-03-31T12:53:47Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.