FastWave: Optimized Diffusion Model for Audio Super-Resolution
- URL: http://arxiv.org/abs/2603.04122v1
- Date: Wed, 04 Mar 2026 14:41:02 GMT
- Title: FastWave: Optimized Diffusion Model for Audio Super-Resolution
- Authors: Nikita Kuznetsov, Maksim Kaledin,
- Abstract summary: Audio Super-Resolution is a set of techniques aimed at high-quality estimation of the given signal as if it would be sampled with higher sample rate.<n>We propose a solution by re-considering the recent advances in the training of diffusion models and applying them to super-resolution from any to 48 kHz sample rate.<n>Our model called FastWave has around 50 GFLOPs of computational complexity and 1.3 M parameters and can be trained with less resources and significantly faster than the majority of recently proposed diffusion- and flow-based solutions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio Super-Resolution is a set of techniques aimed at high-quality estimation of the given signal as if it would be sampled with higher sample rate. Among suggested methods there are diffusion and flow models (which are considered slower), generative adversarial networks (which are considered faster), however both approaches are currently presented by high-parametric networks, requiring high computational costs both for training and inference. We propose a solution to both these problems by re-considering the recent advances in the training of diffusion models and applying them to super-resolution from any to 48 kHz sample rate. Our approach shows better results than NU-Wave 2 and is comparable to state-of-the-art models. Our model called FastWave has around 50 GFLOPs of computational complexity and 1.3 M parameters and can be trained with less resources and significantly faster than the majority of recently proposed diffusion- and flow-based solutions. The code has been made publicly available.
Related papers
- CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z) - TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics [40.75121059939763]
We introduce a new sampling method that is up to $186%$ faster than the current state of the art solver for comparative FID on ImageNet512.<n>The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples.
arXiv Detail & Related papers (2025-06-26T20:30:27Z) - Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance.<n>We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point.<n>Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - Diffusion Models for Accurate Channel Distribution Generation [19.80498913496519]
Strong generative models can accurately learn channel distributions.
This could save recurring costs for physical measurements of the channel.
The resulting differentiable channel model supports training neural encoders by enabling gradient-based optimization.
arXiv Detail & Related papers (2023-09-19T10:35:54Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models.
Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps.
Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z) - Wavelet Diffusion Models are fast and scalable Image Generators [3.222802562733787]
Diffusion models are a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances.
Recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts.
This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion scheme.
We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality.
arXiv Detail & Related papers (2022-11-29T12:25:25Z) - Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.
Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method.
We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.