Addressing the confounds of accompaniments in singer identification
- URL: http://arxiv.org/abs/2002.06817v1
- Date: Mon, 17 Feb 2020 07:49:21 GMT
- Title: Addressing the confounds of accompaniments in singer identification
- Authors: Tsung-Han Hsieh, Kai-Hsiang Cheng, Zhe-Cheng Fan, Yu-Ching Yang,
Yi-Hsuan Yang
- Abstract summary: We employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music.
We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data.
- Score: 29.949390919663596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying singers is an important task with many applications. However, the
task remains challenging due to many issues. One major issue is related to the
confounding factors from the background instrumental music that is mixed with
the vocals in music production. A singer identification model may learn to
extract non-vocal related features from the instrumental part of the songs, if
a singer only sings in certain musical contexts (e.g., genres). The model
cannot therefore generalize well when the singer sings in unseen contexts. In
this paper, we attempt to address this issue. Specifically, we employ
open-unmix, an open source tool with state-of-the-art performance in source
separation, to separate the vocal and instrumental tracks of music. We then
investigate two means to train a singer identification model: by learning from
the separated vocal only, or from an augmented set of data where we
"shuffle-and-remix" the separated vocal tracks and instrumental tracks of
different songs to artificially make the singers sing in different contexts. We
also incorporate melodic features learned from the vocal melody contour for
better performance. Evaluation results on a benchmark dataset called the
artist20 shows that this data augmentation method greatly improves the accuracy
of singer identification.
Related papers
- GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores.
We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset.
We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z) - SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - From Real to Cloned Singer Identification [7.407642348217603]
We present three embedding models that are trained using a singer-level contrastive learning scheme.
We demonstrate that all three models are highly capable of identifying real singers.
However, their performance deteriorates when classifying cloned versions of singers in our evaluation set.
arXiv Detail & Related papers (2024-07-11T16:25:21Z) - Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment [56.019288564115136]
We propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation.
We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis.
evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency.
arXiv Detail & Related papers (2024-04-14T18:00:05Z) - Singer Identity Representation Learning using Self-Supervised Techniques [0.0]
We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks.
We explore different self-supervised learning techniques on a large collection of isolated vocal tracks.
We evaluate the quality of the resulting representations on singer similarity and identification tasks.
arXiv Detail & Related papers (2024-01-10T10:41:38Z) - SingFake: Singing Voice Deepfake Detection [16.82140520915859]
Singing voices present different acoustic and linguistic characteristics from speech utterances.
We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide.
We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances.
arXiv Detail & Related papers (2023-09-14T08:49:05Z) - Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data.
We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z) - SingSong: Generating musical accompaniments from singing [35.819589427197464]
We present SingSong, a system that generates instrumental music to accompany input vocals.
In a pairwise comparison with the same vocal inputs, listeners expressed a significant preference for instrumentals generated by SingSong.
arXiv Detail & Related papers (2023-01-30T04:53:23Z) - Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB)
Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre.
We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z) - Deep Learning Approach for Singer Voice Classification of Vietnamese
Popular Music [1.2043574473965315]
We propose a new method to identify the singer's name based on analysis of Vietnamese popular music.
We employ the use of vocal segment detection and singing voice separation as the pre-processing steps.
To verify the accuracy of our methods, we evaluate on a dataset of 300 Vietnamese songs from 18 famous singers.
arXiv Detail & Related papers (2021-02-24T08:03:07Z) - DeepSinger: Singing Voice Synthesis with Data Mined From the Web [194.10598657846145]
DeepSinger is a multi-lingual singing voice synthesis system built from scratch using singing training data mined from music websites.
We evaluate DeepSinger on our mined singing dataset that consists of about 92 hours data from 89 singers on three languages.
arXiv Detail & Related papers (2020-07-09T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.