Related papers: From Real to Cloned Singer Identification

From Real to Cloned Singer Identification

URL: http://arxiv.org/abs/2407.08647v1
Date: Thu, 11 Jul 2024 16:25:21 GMT
Title: From Real to Cloned Singer Identification
Authors: Dorian Desblancs, Gabriel Meseguer-Brocal, Romain Hennequin, Manuel Moussallam,
Abstract summary: We present three embedding models that are trained using a singer-level contrastive learning scheme. We demonstrate that all three models are highly capable of identifying real singers. However, their performance deteriorates when classifying cloned versions of singers in our evaluation set.
Score: 7.407642348217603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cloned voices of popular singers sound increasingly realistic and have gained popularity over the past few years. They however pose a threat to the industry due to personality rights concerns. As such, methods to identify the original singer in synthetic voices are needed. In this paper, we investigate how singer identification methods could be used for such a task. We present three embedding models that are trained using a singer-level contrastive learning scheme, where positive pairs consist of segments with vocals from the same singers. These segments can be mixtures for the first model, vocals for the second, and both for the third. We demonstrate that all three models are highly capable of identifying real singers. However, their performance deteriorates when classifying cloned versions of singers in our evaluation set. This is especially true for models that use mixtures as an input. These findings highlight the need to understand the biases that exist within singer identification systems, and how they can influence the identification of voice deepfakes in music.

Related papers

Machine Learning Approaches to Vocal Register Classification in Contemporary Male Pop Music [49.1574468325115]
In pop music, where a single artist may use a variety of timbre's and textures to achieve a desired quality, it can be difficult to identify what vocal register within the vocal range a singer is using.<n>This paper presents two methods for classifying vocal registers in an audio signal of male pop music through the analysis of textural features of mel-spectrogram images.
arXiv Detail & Related papers (2025-05-16T15:41:28Z)
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control [58.96445085236971]
Zero-shot singing voice synthesis (SVS) with style transfer and style control aims to generate high-quality singing voices with unseen timbres and styles. We introduce TCSinger, the first zero-shot SVS model for style transfer across cross-lingual speech and singing styles. We show that TCSinger outperforms all baseline models in quality synthesis, singer similarity, and style controllability across various tasks.
arXiv Detail & Related papers (2024-09-24T11:18:09Z)
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores. We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset. We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z)
Singer Identity Representation Learning using Self-Supervised Techniques [0.0]
We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks. We evaluate the quality of the resulting representations on singer similarity and identification tasks.
arXiv Detail & Related papers (2024-01-10T10:41:38Z)
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis [63.18764165357298]
Style transfer for out-of-domain singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles. StyleSinger is the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. Our evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples.
arXiv Detail & Related papers (2023-12-17T15:26:16Z)
SingFake: Singing Voice Deepfake Detection [16.82140520915859]
Singing voices present different acoustic and linguistic characteristics from speech utterances. We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide. We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances.
arXiv Detail & Related papers (2023-09-14T08:49:05Z)
Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB) Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre. We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z)
Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music [1.2043574473965315]
We propose a new method to identify the singer's name based on analysis of Vietnamese popular music. We employ the use of vocal segment detection and singing voice separation as the pre-processing steps. To verify the accuracy of our methods, we evaluate on a dataset of 300 Vietnamese songs from 18 famous singers.
arXiv Detail & Related papers (2021-02-24T08:03:07Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites. We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z)
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer [11.598416444452619]
We design a multi-singer framework to leverage all the existing singing data of different singers. We incorporate an adversarial task of singer classification to make encoder output less singer dependent. The proposed synthesizer can generate higher quality singing voice than baseline.
arXiv Detail & Related papers (2020-06-18T07:20:11Z)
Addressing the confounds of accompaniments in singer identification [29.949390919663596]
We employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data.
arXiv Detail & Related papers (2020-02-17T07:49:21Z)
Score and Lyrics-Free Singing Voice Generation [48.55126268721948]
We explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time. We implement such models using generative adversarial networks and evaluate them both objectively and subjectively.
arXiv Detail & Related papers (2019-12-26T01:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.