Deep Autotuner: a Pitch Correcting Network for Singing Performances
- URL: http://arxiv.org/abs/2002.05511v1
- Date: Wed, 12 Feb 2020 01:33:56 GMT
- Title: Deep Autotuner: a Pitch Correcting Network for Singing Performances
- Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim
- Abstract summary: We introduce a data-driven approach to automatic pitch correction of solo singing performances.
We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation.
The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.
- Score: 26.019582802302033
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We introduce a data-driven approach to automatic pitch correction of solo
singing performances. The proposed approach predicts note-wise pitch shifts
from the relationship between the respective spectrograms of the singing and
accompaniment. This approach differs from commercial systems, where vocal track
notes are usually shifted to be centered around pitches in a user-defined
score, or mapped to the closest pitch among the twelve equal-tempered scale
degrees. The proposed system treats pitch as a continuous value rather than
relying on a set of discretized notes found in musical scores, thus allowing
for improvisation and harmonization in the singing performance. We train our
neural network model using a dataset of 4,702 amateur karaoke performances
selected for good intonation. Our model is trained on both incorrect
intonation, for which it learns a correction, and intentional pitch variation,
which it learns to preserve. The proposed deep neural network with gated
recurrent units on top of convolutional layers shows promising performance on
the real-world score-free singing pitch correction task of autotuning.
Related papers
- Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Karaoker: Alignment-free singing voice synthesis with speech training
data [3.9795908407245055]
Karaoker is a multispeaker Tacotron-based model conditioned on voice characteristic features.
The model is jointly conditioned with a single deep convolutional encoder on continuous data.
We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks.
arXiv Detail & Related papers (2022-04-08T15:33:59Z) - Improved singing voice separation with chromagram-based pitch-aware
remixing [26.299721372221736]
We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed.
We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
arXiv Detail & Related papers (2022-03-28T20:55:54Z) - Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB)
Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre.
We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z) - TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic
Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions.
We present an improved input representation, the Tone-CFP, that explicitly groups harmonics.
Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.