Real-time Timbre Remapping with Differentiable DSP
- URL: http://arxiv.org/abs/2407.04547v1
- Date: Fri, 5 Jul 2024 14:32:52 GMT
- Title: Real-time Timbre Remapping with Differentiable DSP
- Authors: Jordie Shier, Charalampos Saitis, Andrew Robertson, Andrew McPherson,
- Abstract summary: Timbre is a primary mode of expression in diverse musical contexts.
Our approach draws on the concept of timbre analogies.
We demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.
- Score: 1.3803836644947054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.
Related papers
- Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation [52.0893266767733]
We propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features.
To enhance the model's robustness to different synthesizer characteristics, we propose a synthesizer feature augmentation strategy.
arXiv Detail & Related papers (2024-11-14T03:57:21Z) - FM Tone Transfer with Envelope Learning [8.771755521263811]
Tone Transfer is a novel technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content.
It presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context.
arXiv Detail & Related papers (2023-10-07T14:03:25Z) - Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - Differentiable WORLD Synthesizer-based Neural Vocoder With Application
To End-To-End Audio Style Transfer [6.29475963948119]
We propose a differentiable WORLD synthesizer and demonstrate its use in end-to-end audio style transfer tasks.
Our baseline differentiable synthesizer has no model parameters, yet it yields adequate quality synthesis.
An alternative differentiable approach considers extraction of the source spectrum directly, which can improve naturalness.
arXiv Detail & Related papers (2022-08-15T15:48:36Z) - Neural Waveshaping Synthesis [0.0]
We present a novel, lightweight, fully causal approach to neural audio synthesis.
The Neural Waveshaping Unit (NEWT) operates directly in the waveform domain.
It produces complex timbral evolutions by simple affine transformations of its input and output signals.
arXiv Detail & Related papers (2021-07-11T13:50:59Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation.
We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z) - TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre
Transfer [34.02807083910344]
We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal.
We show that the Constant Q Transform representation is particularly well-suited to convolutional architectures.
arXiv Detail & Related papers (2018-11-22T17:46:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.