Related papers: DDSP: Differentiable Digital Signal Processing

DDSP: Differentiable Digital Signal Processing

URL: http://arxiv.org/abs/2001.04643v1
Date: Tue, 14 Jan 2020 06:49:37 GMT
Title: DDSP: Differentiable Digital Signal Processing
Authors: Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, Adam Roberts
Abstract summary: We introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. We achieve high-fidelity generation without the need for large autoregressive models or adversarial losses. P enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning.
Score: 13.448630251745163
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most generative models of audio directly generate samples in one of two domains: time or frequency. While sufficient to express any signal, these representations are inefficient, as they do not utilize existing knowledge of how sound is generated and perceived. A third approach (vocoders/synthesizers) successfully incorporates strong domain knowledge of signal processing and perception, but has been less actively researched due to limited expressivity and difficulty integrating with modern auto-differentiation-based machine learning methods. In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods. Focusing on audio synthesis, we achieve high-fidelity generation without the need for large autoregressive models or adversarial losses, demonstrating that DDSP enables utilizing strong inductive biases without losing the expressive power of neural networks. Further, we show that combining interpretable modules permits manipulation of each separate model component, with applications such as independent control of pitch and loudness, realistic extrapolation to pitches not seen during training, blind dereverberation of room acoustics, transfer of extracted room acoustics to new environments, and transformation of timbre between disparate sources. In short, DDSP enables an interpretable and modular approach to generative modeling, without sacrificing the benefits of deep learning. The library is publicly available at https://github.com/magenta/ddsp and we welcome further contributions from the community and domain experts.

Related papers

FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition [17.568724398229232]
Speech emotion recognition (SER) plays an important role in emotional states from deciphering speech signals. This paper introduces a new end-to-end (E2E) deep learning multi-resolution framework for SER. It exploits the capabilities of wavelets for effective localization in both time and frequency domains.
arXiv Detail & Related papers (2025-02-01T04:18:06Z)
Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises [18.539501941328393]
This paper constructs a latent diffusion model-enabled SemCom system, and proposes three improvements compared to existing works. A lightweight single-layer latent space transformation adapter completes one-shot learning at the transmitter. An end-to-end consistency distillation strategy is used to distill the diffusion models trained in latent space.
arXiv Detail & Related papers (2024-06-09T23:39:31Z)
Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models. They rely on sequential denoising steps during sample generation. We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z)
Brain-Driven Representation Learning Based on Diffusion Model [25.375490061512]
Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue. Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms. Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
arXiv Detail & Related papers (2023-11-14T05:59:58Z)
Deep Feature Learning for Wireless Spectrum Data [0.5809784853115825]
We propose an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner. We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts.
arXiv Detail & Related papers (2023-08-07T12:27:19Z)
On Neural Architectures for Deep Learning-based Source Separation of Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals. We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned. We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem. Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z)
Learning Signal-Agnostic Manifolds of Neural Fields [50.066449953522685]
We leverage neural fields to capture the underlying structure in image, shape, audio and cross-modal audiovisual domains. We show that by walking across the underlying manifold of GEM, we may generate new samples in our signal domains.
arXiv Detail & Related papers (2021-11-11T18:57:40Z)
Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing. By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner. We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.