Score and Lyrics-Free Singing Voice Generation
- URL: http://arxiv.org/abs/1912.11747v2
- Date: Tue, 21 Jul 2020 06:48:42 GMT
- Title: Score and Lyrics-Free Singing Voice Generation
- Authors: Jen-Yu Liu and Yu-Hua Chen and Yin-Cheng Yeh and Yi-Hsuan Yang
- Abstract summary: We explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time.
We implement such models using generative adversarial networks and evaluate them both objectively and subjectively.
- Score: 48.55126268721948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models for singing voice have been mostly concerned with the task
of ``singing voice synthesis,'' i.e., to produce singing voice waveforms given
musical scores and text lyrics. In this work, we explore a novel yet
challenging alternative: singing voice generation without pre-assigned scores
and lyrics, in both training and inference time. In particular, we outline
three such generation schemes, and propose a pipeline to tackle these new
tasks. Moreover, we implement such models using generative adversarial networks
and evaluate them both objectively and subjectively.
Related papers
- SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - Singer Identity Representation Learning using Self-Supervised Techniques [0.0]
We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks.
We explore different self-supervised learning techniques on a large collection of isolated vocal tracks.
We evaluate the quality of the resulting representations on singer similarity and identification tasks.
arXiv Detail & Related papers (2024-01-10T10:41:38Z) - StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis [63.18764165357298]
Style transfer for out-of-domain singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles.
StyleSinger is the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples.
Our evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples.
arXiv Detail & Related papers (2023-12-17T15:26:16Z) - Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB)
Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre.
We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z) - A Melody-Unsupervision Model for Singing Voice Synthesis [9.137554315375919]
We propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time.
We show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
arXiv Detail & Related papers (2021-10-13T07:42:35Z) - An Empirical Study on End-to-End Singing Voice Synthesis with
Encoder-Decoder Architectures [11.440111473570196]
We use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis.
We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information.
arXiv Detail & Related papers (2021-08-06T08:51:16Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites.
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z) - Adversarially Trained Multi-Singer Sequence-To-Sequence Singing
Synthesizer [11.598416444452619]
We design a multi-singer framework to leverage all the existing singing data of different singers.
We incorporate an adversarial task of singer classification to make encoder output less singer dependent.
The proposed synthesizer can generate higher quality singing voice than baseline.
arXiv Detail & Related papers (2020-06-18T07:20:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.