Improved singing voice separation with chromagram-based pitch-aware
remixing
- URL: http://arxiv.org/abs/2203.15092v1
- Date: Mon, 28 Mar 2022 20:55:54 GMT
- Title: Improved singing voice separation with chromagram-based pitch-aware
remixing
- Authors: Siyuan Yuan, Zhepei Wang, Umut Isik, Ritwik Giri, Jean-Marc Valin,
Michael M. Goodwin, Arvindh Krishnaswamy
- Abstract summary: We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed.
We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
- Score: 26.299721372221736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Singing voice separation aims to separate music into vocals and accompaniment
components. One of the major constraints for the task is the limited amount of
training data with separated vocals. Data augmentation techniques such as
random source mixing have been shown to make better use of existing data and
mildly improve model performance. We propose a novel data augmentation
technique, chromagram-based pitch-aware remixing, where music segments with
high pitch alignment are mixed. By performing controlled experiments in both
supervised and semi-supervised settings, we demonstrate that training models
with pitch-aware remixing significantly improves the test signal-to-distortion
ratio (SDR)
Related papers
- SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
Music Synthesis [0.0]
We introduce SpecDiff-GAN, a neural vocoder based on HiFi-GAN.
We show the merits of our proposed model for speech and music synthesis on several datasets.
arXiv Detail & Related papers (2024-01-30T09:17:57Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - Resource-constrained stereo singing voice cancellation [1.0962868591006976]
We study the problem of stereo singing voice cancellation.
Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial.
arXiv Detail & Related papers (2024-01-22T16:05:30Z) - Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Music Mixing Style Transfer: A Contrastive Learning Approach to
Disentangle Audio Effects [23.29395422386749]
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.
This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording.
arXiv Detail & Related papers (2022-11-04T03:45:17Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Modeling the Compatibility of Stem Tracks to Generate Music Mashups [6.922825755771942]
A music mashup combines audio elements from two or more songs to create a new work.
Research has developed algorithms that predict the compatibility of audio elements.
arXiv Detail & Related papers (2021-03-26T01:51:11Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Deep Autotuner: a Pitch Correcting Network for Singing Performances [26.019582802302033]
We introduce a data-driven approach to automatic pitch correction of solo singing performances.
We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation.
The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.
arXiv Detail & Related papers (2020-02-12T01:33:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.