Deep Audio Waveform Prior
- URL: http://arxiv.org/abs/2207.10441v1
- Date: Thu, 21 Jul 2022 12:25:03 GMT
- Title: Deep Audio Waveform Prior
- Authors: Arnon Turetzky, Tzvi Michelson, Yossi Adi, Shmuel Peleg
- Abstract summary: We show that existing SOTA architectures for audio source separation contain deep priors even when working with the raw waveform.
A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal.
- Score: 19.826973437576395
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Convolutional neural networks contain strong priors for generating natural
looking images [1]. These priors enable image denoising, super resolution, and
inpainting in an unsupervised manner. Previous attempts to demonstrate similar
ideas in audio, namely deep audio priors, (i) use hand picked architectures
such as harmonic convolutions, (ii) only work with spectrogram input, and (iii)
have been used mostly for eliminating Gaussian noise [2]. In this work we show
that existing SOTA architectures for audio source separation contain deep
priors even when working with the raw waveform. Deep priors can be discovered
by training a neural network to generate a single corrupted signal when given
white noise as input. A network with relevant deep priors is likely to generate
a cleaner version of the signal before converging on the corrupted signal. We
demonstrate this restoration effect with several corruptions: background noise,
reverberations, and a gap in the signal (audio inpainting).
Related papers
- Bayesian Formulations for Graph Spectral Denoising [9.086602432203417]
We consider the problem of denoising features associated to complex data, modeled as signals on a graph, via a smoothness prior.
We present algorithms for the cases where the signal is perturbed by Gaussian noise, dropout, and uniformly distributed noise.
We demonstrate the algorithms' ability to effectively restore signals from white noise on image data and from severe dropout in single-cell RNA sequence data.
arXiv Detail & Related papers (2023-11-27T23:53:19Z) - Unsupervised Denoising for Signal-Dependent and Row-Correlated Imaging Noise [54.0185721303932]
We present the first fully unsupervised deep learning-based denoiser capable of handling imaging noise that is row-correlated.
Our approach uses a Variational Autoencoder with a specially designed autoregressive decoder.
Our method does not require a pre-trained noise model and can be trained from scratch using unpaired noisy data.
arXiv Detail & Related papers (2023-10-11T20:48:20Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding [71.73405116189531]
We propose a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulates those defined in conventional vocoders.
As the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.
arXiv Detail & Related papers (2021-10-13T01:39:57Z) - Deep Learning Radio Frequency Signal Classification with Hybrid Images [0.0]
We focus on the different pre-processing steps that can be used on the input training data, and test the results on a fixed Deep Learning architecture.
We propose a hybrid image that takes advantage of both time and frequency domain information, and tackles the classification as a Computer Vision problem.
arXiv Detail & Related papers (2021-05-19T11:12:09Z) - End-to-End Video-To-Speech Synthesis using Generative Adversarial
Networks [54.43697805589634]
We propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs)
Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech.
We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID.
arXiv Detail & Related papers (2021-04-27T17:12:30Z) - AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis [55.24336227884039]
We present a novel framework to generate high-fidelity talking head video.
We use neural scene representation networks to bridge the gap between audio input and video output.
Our framework can (1) produce high-fidelity and natural results, and (2) support free adjustment of audio signals, viewing directions, and background images.
arXiv Detail & Related papers (2021-03-20T02:58:13Z) - Deep Neural Networks based Invisible Steganography for Audio-into-Image
Algorithm [0.0]
The integrity of both image and audio is well preserved, while the maximum length of the hidden audio is significantly improved.
We employ a joint deep neural network architecture consisting of two sub-models: the first network hides the secret audio into an image, and the second one is responsible for decoding the image to obtain the original audio.
arXiv Detail & Related papers (2021-02-18T06:13:05Z) - Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images [98.82804259905478]
We present Neighbor2Neighbor to train an effective image denoising model with only noisy images.
In detail, input and target used to train a network are images sub-sampled from the same noisy image.
A denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance.
arXiv Detail & Related papers (2021-01-08T02:03:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.