Fast and efficient speech enhancement with variational autoencoders
- URL: http://arxiv.org/abs/2211.02728v1
- Date: Wed, 2 Nov 2022 09:52:13 GMT
- Title: Fast and efficient speech enhancement with variational autoencoders
- Authors: Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH)
- Abstract summary: Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods.
We propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors.
Our experiments demonstrate that the developed framework makes an effective compromise between computational efficiency and enhancement quality, and outperforms existing methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised speech enhancement based on variational autoencoders has shown
promising performance compared with the commonly used supervised methods. This
approach involves the use of a pre-trained deep speech prior along with a
parametric noise model, where the noise parameters are learned from the noisy
speech signal with an expectationmaximization (EM)-based method. The E-step
involves an intractable latent posterior distribution. Existing algorithms to
solve this step are either based on computationally heavy Monte Carlo Markov
Chain sampling methods and variational inference, or inefficient
optimization-based methods. In this paper, we propose a new approach based on
Langevin dynamics that generates multiple sequences of samples and comes with a
total variation-based regularization to incorporate temporal correlations of
latent vectors. Our experiments demonstrate that the developed framework makes
an effective compromise between computational efficiency and enhancement
quality, and outperforms existing methods.
Related papers
- Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Hyperparameter Estimation for Sparse Bayesian Learning Models [1.0172874946490507]
Aparse Bayesian Learning (SBL) models are extensively used in signal processing and machine learning for promoting sparsity through hierarchical priors.
This paper presents a framework for the improvement of SBL models for various objective functions.
A novel algorithm is introduced showing enhanced efficiency, especially under signal noise ratios.
arXiv Detail & Related papers (2024-01-04T21:24:01Z) - Posterior sampling algorithms for unsupervised speech enhancement with
recurrent variational autoencoder [0.0]
We address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE)
This approach offers promising generalization performance over the supervised counterpart.
We present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE.
arXiv Detail & Related papers (2023-09-19T08:59:32Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms [8.669461942767098]
We study momentum-based first-order optimization algorithms in which the iterations are subject to an additive white noise.
For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification.
We introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time.
arXiv Detail & Related papers (2022-09-24T04:26:30Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z) - Plug-And-Play Learned Gaussian-mixture Approximate Message Passing [71.74028918819046]
We propose a plug-and-play compressed sensing (CS) recovery algorithm suitable for any i.i.d. source prior.
Our algorithm builds upon Borgerding's learned AMP (LAMP), yet significantly improves it by adopting a universal denoising function within the algorithm.
Numerical evaluation shows that the L-GM-AMP algorithm achieves state-of-the-art performance without any knowledge of the source prior.
arXiv Detail & Related papers (2020-11-18T16:40:45Z) - BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model.
Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z) - Improving auditory attention decoding performance of linear and
non-linear methods using state-space model [21.40315235087551]
Recent advances in electroencephalography have shown that it is possible to identify the target speaker from single-trial EEG recordings.
AAD methods reconstruct the attended speech envelope from EEG recordings, based on a linear least-squares cost function or non-linear neural networks.
We investigate a state-space model using correlation coefficients obtained with a small correlation window to improve the decoding performance.
arXiv Detail & Related papers (2020-04-02T09:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.