Diffusion Models for Audio Restoration
- URL: http://arxiv.org/abs/2402.09821v2
- Date: Mon, 15 Jul 2024 10:15:12 GMT
- Title: Diffusion Models for Audio Restoration
- Authors: Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann,
- Abstract summary: We present here audio restoration algorithms based on diffusion models.
We show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms.
We explain the diffusion formalism and its application to the conditional generation of clean audio signals.
- Score: 22.385385150594185
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interferences originating at the recording side or caused by an imperfect transmission pipeline. To address this problem, audio restoration methods aim to recover clean sound signals from the corrupted input data. We present here audio restoration algorithms based on diffusion models, with a focus on speech enhancement and music restoration tasks. Traditional approaches, often grounded in handcrafted rules and statistical heuristics, have shaped our understanding of audio signals. In the past decades, there has been a notable shift towards data-driven methods that exploit the modeling capabilities of DNNs. Deep generative models, and among them diffusion models, have emerged as powerful techniques for learning complex data distributions. However, relying solely on DNN-based learning approaches carries the risk of reducing interpretability, particularly when employing end-to-end models. Nonetheless, data-driven approaches allow more flexibility in comparison to statistical model-based frameworks, whose performance depends on distributional and statistical assumptions that can be difficult to guarantee. Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality. We explain the diffusion formalism and its application to the conditional generation of clean audio signals. We believe that diffusion models open an exciting field of research with the potential to spawn new audio restoration algorithms that are natural-sounding and remain robust in difficult acoustic situations.
Related papers
- The last Dance : Robust backdoor attack via diffusion models and bayesian approach [0.0]
Diffusion models are state-of-the-art deep learning generative models trained on the principle of learning forward and backward.
We demonstrate the feasibility of backdoor attacks on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research.
arXiv Detail & Related papers (2024-02-05T18:00:07Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - VideoFusion: Decomposed Diffusion Models for High-Quality Video
Generation [88.49030739715701]
This work presents a decomposed diffusion process via resolving the per-frame noise into a base noise that is shared among all frames and a residual noise that varies along the time axis.
Experiments on various datasets confirm that our approach, termed as VideoFusion, surpasses both GAN-based and diffusion-based alternatives in high-quality video generation.
arXiv Detail & Related papers (2023-03-15T02:16:39Z) - DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises [38.72460741779243]
We introduce DINOISER to facilitate diffusion models for sequence generation by manipulating noises.
Experiments show that DINOISER enables consistent improvement over the baselines of previous diffusion-based sequence generative models.
arXiv Detail & Related papers (2023-02-20T15:14:46Z) - Removing Structured Noise with Diffusion Models [14.187153638386379]
We show that the powerful paradigm of posterior sampling with diffusion models can be extended to include rich, structured, noise models.
We demonstrate strong performance gains across various inverse problems with structured noise, outperforming competitive baselines.
This opens up new opportunities and relevant practical applications of diffusion modeling for inverse problems in the context of non-Gaussian measurement models.
arXiv Detail & Related papers (2023-01-20T23:42:25Z) - Conditional Diffusion Probabilistic Model for Speech Enhancement [101.4893074984667]
We propose a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.
In our experiments, we demonstrate strong performance of the proposed approach compared to representative generative models.
arXiv Detail & Related papers (2022-02-10T18:58:01Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.