SingFake: Singing Voice Deepfake Detection
- URL: http://arxiv.org/abs/2309.07525v2
- Date: Sun, 21 Jan 2024 08:57:40 GMT
- Title: SingFake: Singing Voice Deepfake Detection
- Authors: Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan
- Abstract summary: Singing voices present different acoustic and linguistic characteristics from speech utterances.
We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide.
We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances.
- Score: 16.82140520915859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of singing voice synthesis presents critical challenges to artists
and industry stakeholders over unauthorized voice usage. Unlike synthesized
speech, synthesized singing voices are typically released in songs containing
strong background music that may hide synthesis artifacts. Additionally,
singing voices present different acoustic and linguistic characteristics from
speech utterances. These unique properties make singing voice deepfake
detection a relevant but significantly different problem from synthetic speech
detection. In this work, we propose the singing voice deepfake detection task.
We first present SingFake, the first curated in-the-wild dataset consisting of
28.93 hours of bonafide and 29.40 hours of deepfake song clips in five
languages from 40 singers. We provide a train/validation/test split where the
test sets include various scenarios. We then use SingFake to evaluate four
state-of-the-art speech countermeasure systems trained on speech utterances. We
find these systems lag significantly behind their performance on speech test
data. When trained on SingFake, either using separated vocal tracks or song
mixtures, these systems show substantial improvement. However, our evaluations
also identify challenges associated with unseen singers, communication codecs,
languages, and musical contexts, calling for dedicated research into singing
voice deepfake detection. The SingFake dataset and related resources are
available at https://www.singfake.org/.
Related papers
- GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores.
We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset.
We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z) - Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment [56.019288564115136]
We propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation.
We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis.
evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency.
arXiv Detail & Related papers (2024-04-14T18:00:05Z) - Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - Singer Identity Representation Learning using Self-Supervised Techniques [0.0]
We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks.
We explore different self-supervised learning techniques on a large collection of isolated vocal tracks.
We evaluate the quality of the resulting representations on singer similarity and identification tasks.
arXiv Detail & Related papers (2024-01-10T10:41:38Z) - Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB)
Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre.
We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z) - Deep Learning Approach for Singer Voice Classification of Vietnamese
Popular Music [1.2043574473965315]
We propose a new method to identify the singer's name based on analysis of Vietnamese popular music.
We employ the use of vocal segment detection and singing voice separation as the pre-processing steps.
To verify the accuracy of our methods, we evaluate on a dataset of 300 Vietnamese songs from 18 famous singers.
arXiv Detail & Related papers (2021-02-24T08:03:07Z) - The Use of Voice Source Features for Sung Speech Recognition [24.129307615741695]
We first use a parallel singing/speaking corpus to illustrate differences in sung vs spoken voicing characteristics.
We then use this analysis to inform speech recognition experiments on the sung speech DSing corpus.
Experiments are run with three standard (increasingly large) training sets, DSing1 (15.1 hours), DSing3 (44.7 hours) and DSing30 (149.1 hours)
arXiv Detail & Related papers (2021-02-20T15:54:26Z) - DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites.
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z) - Addressing the confounds of accompaniments in singer identification [29.949390919663596]
We employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music.
We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data.
arXiv Detail & Related papers (2020-02-17T07:49:21Z) - Score and Lyrics-Free Singing Voice Generation [48.55126268721948]
We explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time.
We implement such models using generative adversarial networks and evaluate them both objectively and subjectively.
arXiv Detail & Related papers (2019-12-26T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.