Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
- URL: http://arxiv.org/abs/2306.01433v2
- Date: Tue, 30 Jan 2024 15:40:06 GMT
- Title: Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
- Authors: Eloi Moliner, Filip Elvander, Vesa V\"alim\"aki
- Abstract summary: In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem.
This paper introduces a novel method called BABE that addresses the blind problem in a zero-shot setting.
BABE exhibits robust generalization capabilities when enhancing real historical recordings.
- Score: 4.030910640265943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to informed methods
when tested with synthetic data. Moreover, BABE exhibits robust generalization
capabilities when enhancing real historical recordings, effectively
reconstructing the missing high-frequency content while maintaining coherence
with the original recording. Subjective preference tests confirm that BABE
significantly improves the audio quality of historical music recordings.
Examples of historical recordings restored with the proposed method are
available on the companion webpage:
(http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/)
Related papers
- Diffusion-based Unsupervised Audio-visual Speech Enhancement [26.937216751657697]
This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach.
It combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model.
Experimental results confirm that the proposed AVSE approach not only outperforms its audio-only counterpart but also generalizes better than a recent supervisedgenerative AVSE method.
arXiv Detail & Related papers (2024-10-04T12:22:54Z) - Efficient Autoregressive Audio Modeling via Next-Scale Prediction [52.663934477127405]
We analyze the token length of audio tokenization and propose a novel textbfScale-level textbfAudio textbfTokenizer (SAT)
Based on SAT, a scale-level textbfAcoustic textbfAutotextbfRegressive (AAR) modeling framework is proposed, which shifts the next-token AR prediction to next-scale AR prediction.
arXiv Detail & Related papers (2024-08-16T21:48:53Z) - Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models [21.669363620480333]
We present an unsupervised method for blind dereverberation and room impulse response estimation, called BUDDy.
In a blind scenario where the room impulse response is unknown, BUDDy successfully performs speech dereverberation.
Unlike supervised methods, which often struggle to generalize, BUDDy seamlessly adapts to different acoustic conditions.
arXiv Detail & Related papers (2024-08-14T11:31:32Z) - BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models [21.66936362048033]
We present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation.
We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined.
arXiv Detail & Related papers (2024-05-07T12:41:31Z) - AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework.
It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - Diffusion Posterior Sampling for Informed Single-Channel Dereverberation [15.16865739526702]
We present an informed single-channel dereverberation method based on conditional generation with diffusion models.
With knowledge of the room impulse response, the anechoic utterance is generated via reverse diffusion.
The proposed approach is largely more robust to measurement noise compared to a state-of-the-art informed single-channel dereverberation method.
arXiv Detail & Related papers (2023-06-21T14:14:05Z) - DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly
Detection [89.49600182243306]
We reformulate the reconstruction process using a diffusion model into a noise-to-norm paradigm.
We propose a rapid one-step denoising paradigm, significantly faster than the traditional iterative denoising in diffusion models.
The segmentation sub-network predicts pixel-level anomaly scores using the input image and its anomaly-free restoration.
arXiv Detail & Related papers (2023-03-15T16:14:06Z) - Blind Restoration of Real-World Audio by 1D Operational GANs [18.462912387382346]
We propose a novel approach for blind restoration of real-world audio signals by Operational Generative Adversarial Networks (Op-GANs)
The proposed approach has been evaluated extensively over the benchmark TIMIT-RAR (speech) and GTZAN-RAR (non-speech) datasets.
Average SDR improvements of over 7.2 dB and 4.9 dB are achieved, respectively, which are substantial when compared with the baseline methods.
arXiv Detail & Related papers (2022-12-30T10:11:57Z) - A Study on Speech Enhancement Based on Diffusion Probabilistic Model [63.38586161802788]
We propose a diffusion probabilistic model-based speech enhancement model (DiffuSE) model that aims to recover clean speech signals from noisy signals.
The experimental results show that DiffuSE yields performance that is comparable to related audio generative models on the standardized Voice Bank corpus task.
arXiv Detail & Related papers (2021-07-25T19:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.