Latent Space Explorations of Singing Voice Synthesis using DDSP
- URL: http://arxiv.org/abs/2103.07197v1
- Date: Fri, 12 Mar 2021 10:38:29 GMT
- Title: Latent Space Explorations of Singing Voice Synthesis using DDSP
- Authors: Juan Alonso and Cumhur Erkut
- Abstract summary: Machine learning based singing voice models require large datasets and lengthy training times.
We present a lightweight architecture that is able to output song-like utterances conditioned only on pitch and amplitude.
We present two zero-configuration tools to train new models and experiment with them.
- Score: 2.7920304852537527
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Machine learning based singing voice models require large datasets and
lengthy training times. In this work we present a lightweight architecture,
based on the Differentiable Digital Signal Processing (DDSP) library, that is
able to output song-like utterances conditioned only on pitch and amplitude,
after twelve hours of training using small datasets of unprocessed audio. The
results are promising, as both the melody and the singer's voice are
recognizable. In addition, we present two zero-configuration tools to train new
models and experiment with them. Currently we are exploring the latent space
representation, which is included in the DDSP library, but not in the original
DDSP examples. Our results indicate that the latent space improves both the
identification of the singer as well as the comprehension of the lyrics. Our
code is available at https://github.com/juanalonso/DDSP-singing-experiments
with links to the zero-configuration notebooks, and our sound examples are at
https://juanalonso.github.io/DDSP-singing-experiments/ .
Related papers
- Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment [56.019288564115136]
We propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation.
We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis.
evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency.
arXiv Detail & Related papers (2024-04-14T18:00:05Z) - WikiMuTe: A web-sourced dataset of semantic descriptions for music audio [7.4327407361824935]
We present WikiMuTe, a new and open dataset containing rich semantic descriptions of music.
The data is sourced from Wikipedia's rich catalogue of articles covering musical works.
We train a model that jointly learns text and audio representations and performs cross-modal retrieval.
arXiv Detail & Related papers (2023-12-14T18:38:02Z) - MAP-Music2Vec: A Simple and Effective Baseline for Self-Supervised Music
Audio Representation Learning [41.633972123961094]
Music2Vec is a framework exploring different SSL algorithmic components and tricks for music audio recordings.
Our model achieves comparable results to the state-of-the-art (SOTA) music SSL model Jukebox, despite being significantly smaller with less than 2% of parameters of the latter.
arXiv Detail & Related papers (2022-12-05T16:04:26Z) - AudioGen: Textually Guided Audio Generation [116.57006301417306]
We tackle the problem of generating audio samples conditioned on descriptive text captions.
In this work, we propose AaudioGen, an auto-regressive model that generates audio samples conditioned on text inputs.
arXiv Detail & Related papers (2022-09-30T10:17:05Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - Real-time Timbre Transfer and Sound Synthesis using DDSP [1.7942265700058984]
We present a real-time implementation of the MagentaP library embedded in a virtual synthesizer as a plug-in.
We focused on timbre transfer from learned representations of real instruments to arbitrary sound inputs as well as controlling these models by MIDI.
We developed a GUI for intuitive high-level controls which can be used for post-processing and manipulating the parameters estimated by the neural network.
arXiv Detail & Related papers (2021-03-12T11:49:51Z) - Anyone GAN Sing [0.0]
We present a method to synthesize the singing voice of a person using a Convolutional Long Short-term Memory (ConvLSTM) based GAN.
Our work is inspired by WGANSing by Chandna et al.
arXiv Detail & Related papers (2021-02-22T14:30:58Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites.
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.