Related papers: Singer Identity Representation Learning using Self-Supervised Techniques

Singer Identity Representation Learning using Self-Supervised Techniques

URL: http://arxiv.org/abs/2401.05064v1
Date: Wed, 10 Jan 2024 10:41:38 GMT
Title: Singer Identity Representation Learning using Self-Supervised Techniques
Authors: Bernardo Torres, Stefan Lattner and Ga\"el Richard
Abstract summary: We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks. We evaluate the quality of the resulting representations on singer similarity and identification tasks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Significant strides have been made in creating voice identity representations using speech data. However, the same level of progress has not been achieved for singing voices. To bridge this gap, we suggest a framework for training singer identity encoders to extract representations suitable for various singing-related tasks, such as singing voice similarity and synthesis. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks and apply data augmentations during training to ensure that the representations are invariant to pitch and content variations. We evaluate the quality of the resulting representations on singer similarity and identification tasks across multiple datasets, with a particular emphasis on out-of-domain generalization. Our proposed framework produces high-quality embeddings that outperform both speaker verification and wav2vec 2.0 pre-trained baselines on singing voice while operating at 44.1 kHz. We release our code and trained models to facilitate further research on singing voice and related areas.

Related papers

Automatic Estimation of Singing Voice Musical Dynamics [9.343063100314687]
We propose a methodology for dataset curation. We compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files. We train a CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics. We conclude through our experiments that bark-scale based features outperform log-Mel-features for the task of singing voice dynamics prediction.
arXiv Detail & Related papers (2024-10-27T18:15:18Z)
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks [52.30565320125514]
GTSinger is a large global, multi-technique, free-to-use, high-quality singing corpus with realistic music scores. We collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset. We conduct four benchmark experiments: technique-controllable singing voice synthesis, technique recognition, style transfer, and speech-to-singing conversion.
arXiv Detail & Related papers (2024-09-20T18:18:14Z)
StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis [63.18764165357298]
Style transfer for out-of-domain singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles. StyleSinger is the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. Our evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples.
arXiv Detail & Related papers (2023-12-17T15:26:16Z)
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker. It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice. Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z)
Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations. We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z)
Audiovisual Singing Voice Separation [25.862550744570324]
Video model takes the input of mouth movement and fuses it into the feature embeddings of an audio-based separation framework. We create two audiovisual singing performance datasets for training and evaluation. The proposed method outperforms audio-based methods in terms of separation quality on most test recordings.
arXiv Detail & Related papers (2021-07-01T06:04:53Z)
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data [81.79070894458322]
We propose a singing voice conversion framework based on VAW-GAN. We train an encoder to disentangle singer identity and singing prosody (F0) from phonetic content. By conditioning on singer identity and F0, the decoder generates output spectral features with unseen target singer identity.
arXiv Detail & Related papers (2020-08-10T09:44:10Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
Addressing the confounds of accompaniments in singer identification [29.949390919663596]
We employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data.
arXiv Detail & Related papers (2020-02-17T07:49:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.