Related papers: NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

URL: http://arxiv.org/abs/2206.08545v1
Date: Fri, 17 Jun 2022 04:40:14 GMT
Title: NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
Authors: Seungu Han, Junhyeok Lee
Abstract summary: We introduce NU-Wave 2, a diffusion model for neural audio upsampling. It generates 48 kHz audio signals from inputs of various sampling rates with a single model. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.

Related papers

Resampling Filter Design for Multirate Neural Audio Effect Processing [9.149661171430257]
We explore the use of signal resampling at the input and output of the neural network as an alternative solution. We show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method.
arXiv Detail & Related papers (2025-01-30T16:44:49Z)
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching [51.70360630470263]
Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video. We propose Frieren, a V2A model based on rectified flow matching. Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment.
arXiv Detail & Related papers (2024-06-01T06:40:22Z)
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations. These models are prone to generate audible artifacts when the conditioning is flawed or imperfect. We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z)
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration [47.07494621683752]
This study proposes a fast and high-quality neural vocoder called textitWaveFit. WaveFit integrates the essence of GANs into a DDPM-like iterative framework based on fixed-point iteration. Subjective listening tests showed no statistically significant differences in naturalness between human natural speech and those synthesized by WaveFit with five iterations.
arXiv Detail & Related papers (2022-10-03T15:45:05Z)
NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability. It generates waveforms at least 280 times faster than the WaveNet vocoder. It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z)
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis [2.28438857884398]
We introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU.
arXiv Detail & Related papers (2021-11-09T09:07:30Z)
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis [80.60577805727624]
WaveGrad 2 is a non-autoregressive generative model for text-to-speech synthesis. It can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system.
arXiv Detail & Related papers (2021-06-17T17:09:21Z)
Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method [67.24600975813419]
We propose a convolution layer capable of handling arbitrary sampling frequencies by a single deep neural network. We show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.
arXiv Detail & Related papers (2021-05-10T02:33:42Z)
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling [0.0]
NU-Wave is the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test.
arXiv Detail & Related papers (2021-04-06T06:52:53Z)
WaveGrad: Estimating Gradients for Waveform Generation [55.405580817560754]
WaveGrad is a conditional model for waveform generation which estimates gradients of the data density. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. We find that it can generate high fidelity audio samples using as few as six iterations.
arXiv Detail & Related papers (2020-09-02T17:44:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.