Related papers: Deep Autotuner: a Pitch Correcting Network for Singing Performances

Deep Autotuner: a Pitch Correcting Network for Singing Performances

URL: http://arxiv.org/abs/2002.05511v1
Date: Wed, 12 Feb 2020 01:33:56 GMT
Title: Deep Autotuner: a Pitch Correcting Network for Singing Performances
Authors: Sanna Wager, George Tzanetakis, Cheng-i Wang, Minje Kim
Abstract summary: We introduce a data-driven approach to automatic pitch correction of solo singing performances. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.
Score: 26.019582802302033
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We introduce a data-driven approach to automatic pitch correction of solo singing performances. The proposed approach predicts note-wise pitch shifts from the relationship between the respective spectrograms of the singing and accompaniment. This approach differs from commercial systems, where vocal track notes are usually shifted to be centered around pitches in a user-defined score, or mapped to the closest pitch among the twelve equal-tempered scale degrees. The proposed system treats pitch as a continuous value rather than relying on a set of discretized notes found in musical scores, thus allowing for improvisation and harmonization in the singing performance. We train our neural network model using a dataset of 4,702 amateur karaoke performances selected for good intonation. Our model is trained on both incorrect intonation, for which it learns a correction, and intentional pitch variation, which it learns to preserve. The proposed deep neural network with gated recurrent units on top of convolutional layers shows promising performance on the real-world score-free singing pitch correction task of autotuning.

Related papers

BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference [6.7611107349018456]
BERT-APC is a novel reference-free Automatic Pitch Correction framework.<n>It corrects pitch errors while maintaining the natural expressiveness of vocal performances.<n>BERT-APC demonstrated superior performance in note pitch prediction, outperforming the second-best model, ROSVOT, by 10.49%p on highly detuned samples.
arXiv Detail & Related papers (2025-11-25T07:16:49Z)
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker. It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice. Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z)
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types. We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input. In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z)
Karaoker: Alignment-free singing voice synthesis with speech training data [3.9795908407245055]
Karaoker is a multispeaker Tacotron-based model conditioned on voice characteristic features. The model is jointly conditioned with a single deep convolutional encoder on continuous data. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks.
arXiv Detail & Related papers (2022-04-08T15:33:59Z)
Improved singing voice separation with chromagram-based pitch-aware remixing [26.299721372221736]
We propose chromagram-based pitch-aware remixing, where music segments with high pitch alignment are mixed. We demonstrate that training models with pitch-aware remixing significantly improves the test signal-to-distortion ratio (SDR)
arXiv Detail & Related papers (2022-03-28T20:55:54Z)
Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB) Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre. We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z)
TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions. We present an improved input representation, the Tone-CFP, that explicitly groups harmonics. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z)
DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score. The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.