Posterior sampling algorithms for unsupervised speech enhancement with
recurrent variational autoencoder
- URL: http://arxiv.org/abs/2309.10439v1
- Date: Tue, 19 Sep 2023 08:59:32 GMT
- Title: Posterior sampling algorithms for unsupervised speech enhancement with
recurrent variational autoencoder
- Authors: Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH)
- Abstract summary: We address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE)
This approach offers promising generalization performance over the supervised counterpart.
We present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the unsupervised speech enhancement problem based
on recurrent variational autoencoder (RVAE). This approach offers promising
generalization performance over the supervised counterpart. Nevertheless, the
involved iterative variational expectation-maximization (VEM) process at test
time, which relies on a variational inference method, results in high
computational complexity. To tackle this issue, we present efficient sampling
techniques based on Langevin dynamics and Metropolis-Hasting algorithms,
adapted to the EM-based speech enhancement with RVAE. By directly sampling from
the intractable posterior distribution within the EM process, we circumvent the
intricacies of variational inference. We conduct a series of experiments,
comparing the proposed methods with VEM and a state-of-the-art supervised
speech enhancement approach based on diffusion models. The results reveal that
our sampling-based algorithms significantly outperform VEM, not only in terms
of computational efficiency but also in overall performance. Furthermore, when
compared to the supervised baseline, our methods showcase robust generalization
performance in mismatched test conditions.
Related papers
- Diversified Sampling Improves Scaling LLM inference [31.18762591875725]
DivSampling is a novel and versatile sampling technique designed to enhance the diversity of candidate solutions.
Our theoretical analysis demonstrates that, under mild assumptions, the error rates of responses generated from diverse prompts are significantly lower compared to those produced by stationary prompts.
arXiv Detail & Related papers (2025-02-16T07:37:58Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.
We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.
Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - Variational Autoencoders for Efficient Simulation-Based Inference [0.3495246564946556]
We present a generative modeling approach based on the variational inference framework for likelihood-free simulation-based inference.
We demonstrate the efficacy of these models on well-established benchmark problems, achieving results comparable to flow-based approaches.
arXiv Detail & Related papers (2024-11-21T12:24:13Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Optimizing Hyperparameters with Conformal Quantile Regression [7.316604052864345]
We propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise.
This translates to quicker HPO convergence on empirical benchmarks.
arXiv Detail & Related papers (2023-05-05T15:33:39Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Fast and efficient speech enhancement with variational autoencoders [0.0]
Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
arXiv Detail & Related papers (2022-11-02T09:52:13Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.