SingAug: Data Augmentation for Singing Voice Synthesis with
Cycle-consistent Training Strategy
- URL: http://arxiv.org/abs/2203.17001v1
- Date: Thu, 31 Mar 2022 12:50:10 GMT
- Title: SingAug: Data Augmentation for Singing Voice Synthesis with
Cycle-consistent Training Strategy
- Authors: Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin
- Abstract summary: Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities.
In this work, we explore different data augmentation methods to boost the training of SVS systems.
To further stabilize the training, we introduce the cycle-consistent training strategy.
- Score: 69.24683717901262
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning based singing voice synthesis (SVS) systems have been
demonstrated to flexibly generate singing with better qualities, compared to
conventional statistical parametric based methods. However, neural systems are
generally data-hungry and have difficulty to reach reasonable singing quality
with limited public available training data. In this work, we explore different
data augmentation methods to boost the training of SVS systems, including
several strategies customized to SVS based on pitch augmentation and mix-up
augmentation. To further stabilize the training, we introduce the
cycle-consistent training strategy. Extensive experiments on two public singing
databases demonstrate that our proposed augmentation methods and the
stabilizing training strategy can significantly improve the performance on both
objective and subjective evaluations.
Related papers
- MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance [14.22941848955693]
MakeSinger is a semi-supervised training method for singing voice synthesis.
Our novel dual guiding mechanism gives text and pitch guidance on the reverse diffusion step.
We demonstrate that by adding Text-to-Speech (TTS) data in training, the model can synthesize the singing voices of TTS speakers even without their singing voices.
arXiv Detail & Related papers (2024-06-10T01:47:52Z) - SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion [12.454955437047573]
We propose a Self-supervised Pitch Augmentation method for Singing Voice Conversion (SPA-SVC)
We introduce a cycle pitch shifting training strategy and Structural Similarity Index (SSIM) loss into our SVC model, effectively enhancing its performance.
Experimental results on the public singing datasets M4Singer indicate that our proposed method significantly improves model performance.
arXiv Detail & Related papers (2024-06-09T08:34:01Z) - Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard.
The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z) - SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection [51.99159169107426]
We present our novel systems developed for the SemEval-2024 hallucination detection task.
Our investigation spans a range of strategies to compare model predictions with reference standards.
We introduce three distinct methods that exhibit strong performance metrics.
arXiv Detail & Related papers (2024-04-09T09:03:44Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Weighted Ensemble Self-Supervised Learning [67.24482854208783]
Ensembling has proven to be a powerful technique for boosting model performance.
We develop a framework that permits data-dependent weighted cross-entropy losses.
Our method outperforms both in multiple evaluation metrics on ImageNet-1K.
arXiv Detail & Related papers (2022-11-18T02:00:17Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - Improving GANs with A Dynamic Discriminator [106.54552336711997]
We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task.
A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional cost or training objectives.
arXiv Detail & Related papers (2022-09-20T17:57:33Z) - Boost AI Power: Data Augmentation Strategies with unlabelled Data and
Conformal Prediction, a Case in Alternative Herbal Medicine Discrimination
with Electronic Nose [12.31253329379136]
Electronic nose proves its effectiveness in alternativeherbal medicine classification, but due to the supervised learn-costing nature, previous research relies on the labelled training data.
This study aims to improve classification accuracy via data augmentationstrategies.
arXiv Detail & Related papers (2021-02-05T10:25:36Z) - Distributed Training of Deep Neural Network Acoustic Models for
Automatic Speech Recognition [33.032361181388886]
We provide an overview of distributed training techniques for deep neural network acoustic models for ASR.
Experiments are carried out on a popular public benchmark to study the convergence, speedup and recognition performance of the investigated strategies.
arXiv Detail & Related papers (2020-02-24T19:31:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.