DDSP: Differentiable Digital Signal Processing
- URL: http://arxiv.org/abs/2001.04643v1
- Date: Tue, 14 Jan 2020 06:49:37 GMT
- Title: DDSP: Differentiable Digital Signal Processing
- Authors: Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, Adam Roberts
- Abstract summary: We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.
We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses.
P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
- Score: 13.448630251745163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most generative models of audio directly generate samples in one of two
domains: time or frequency. While sufficient to express any signal, these
representations are inefficient, as they do not utilize existing knowledge of
how sound is generated and perceived. A third approach (vocoders/synthesizers)
successfully incorporates strong domain knowledge of signal processing and
perception, but has been less actively researched due to limited expressivity
and difficulty integrating with modern auto-differentiation-based machine
learning methods. In this paper, we introduce the Differentiable Digital Signal
Processing (DDSP) library, which enables direct integration of classic signal
processing elements with deep learning methods. Focusing on audio synthesis, we
achieve high-fidelity generation without the need for large autoregressive
models or adversarial losses, demonstrating that DDSP enables utilizing strong
inductive biases without losing the expressive power of neural networks.
Further, we show that combining interpretable modules permits manipulation of
each separate model component, with applications such as independent control of
pitch and loudness, realistic extrapolation to pitches not seen during
training, blind dereverberation of room acoustics, transfer of extracted room
acoustics to new environments, and transformation of timbre between disparate
sources. In short, DDSP enables an interpretable and modular approach to
generative modeling, without sacrificing the benefits of deep learning. The
library is publicly available at https://github.com/magenta/ddsp and we welcome
further contributions from the community and domain experts.
Related papers
- Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises [18.539501941328393]
This paper constructs a latent diffusion model-enabled SemCom system, and proposes three improvements compared to existing works.
A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter.
An end-to-end consistency distillation strategy is used to distill the diffusion models trained in latent space.
arXiv Detail & Related papers (2024-06-09T23:39:31Z) - Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - Brain-Driven Representation Learning Based on Diffusion Model [25.375490061512]
Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue.
Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms.
Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
arXiv Detail & Related papers (2023-11-14T05:59:58Z) - Deep Feature Learning for Wireless Spectrum Data [0.5809784853115825]
We propose an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner.
We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts.
arXiv Detail & Related papers (2023-08-07T12:27:19Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Learning Signal-Agnostic Manifolds of Neural Fields [50.066449953522685]
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains.
We show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains.
arXiv Detail & Related papers (2021-11-11T18:57:40Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.