PrimaDNN': A Characteristics-aware DNN Customization for Singing
Technique Detection
- URL: http://arxiv.org/abs/2306.14191v1
- Date: Sun, 25 Jun 2023 10:15:18 GMT
- Title: PrimaDNN': A Characteristics-aware DNN Customization for Singing
Technique Detection
- Authors: Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
- Abstract summary: We propose PrimaDNN, a deep neural network model with a characteristics-oriented improvement.
In the results of J-POP singing technique detection, PrimaDNN achieved the best results of 44.9% at the overall macro-F measure.
- Score: 5.399268560100004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Professional vocalists modulate their voice timbre or pitch to make their
vocal performance more expressive. Such fluctuations are called singing
techniques. Automatic detection of singing techniques from audio tracks can be
beneficial to understand how each singer expresses the performance, yet it can
also be difficult due to the wide variety of the singing techniques. A deep
neural network (DNN) model can handle such variety; however, there might be a
possibility that considering the characteristics of the data improves the
performance of singing technique detection. In this paper, we propose PrimaDNN,
a CRNN model with a characteristics-oriented improvement. The features of the
model are: 1) input feature representation based on auxiliary pitch information
and multi-resolution mel spectrograms, 2) Convolution module based on the
Squeeze-and-excitation (SENet) and the Instance normalization. In the results
of J-POP singing technique detection, PrimaDNN achieved the best results of
44.9% at the overall macro-F measure, compared to conventional works. We also
found that the contribution of each component varies depending on the type of
singing technique.
Related papers
- Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt [50.25271407721519]
We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language.
We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation.
Experiments show that our model achieves favorable controlling ability and audio quality.
arXiv Detail & Related papers (2024-03-18T13:39:05Z) - MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice
Enhancement [8.782080886602145]
We propose a novel temporal-frequency neural network (MBTFNet) for singing voice enhancement.
MBTFNet removes background music, noise and even backing vocals from singing recordings.
Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.
arXiv Detail & Related papers (2023-10-06T16:44:47Z) - Enhancing the vocal range of single-speaker singing voice synthesis with
melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker.
It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice.
Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z) - Learning the Beauty in Songs: Neural Singing Voice Beautifier [69.21263011242907]
We are interested in a novel task, singing voice beautifying (SVB)
Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre.
We introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task.
arXiv Detail & Related papers (2022-02-27T03:10:12Z) - TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic
Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions.
We present an improved input representation, the Tone-CFP, that explicitly groups harmonics.
Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z) - Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System [25.573552964889963]
This paper presents Sinsy, a deep neural network (DNN)-based singing voice synthesis (SVS) system.
The proposed system is composed of four modules: a time-lag model, a duration model, an acoustic model, and a vocoder.
Experimental results show our system can synthesize a singing voice with better timing, more natural vibrato, and correct pitch.
arXiv Detail & Related papers (2021-08-05T17:59:58Z) - DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion [51.83469048737548]
We propose DiffSVC, an SVC system based on denoising diffusion probabilistic model.
A denoising module is trained in DiffSVC, which takes destroyed mel spectrogram and its corresponding step information as input to predict the added Gaussian noise.
Experiments show that DiffSVC can achieve superior conversion performance in terms of naturalness and voice similarity to current state-of-the-art SVC approaches.
arXiv Detail & Related papers (2021-05-28T14:26:40Z) - DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis [53.19363127760314]
DiffSinger is a parameterized Markov chain which iteratively converts the noise into mel-spectrogram conditioned on the music score.
The evaluations conducted on the Chinese singing dataset demonstrate that DiffSinger outperforms state-of-the-art SVS work with a notable margin.
arXiv Detail & Related papers (2021-05-06T05:21:42Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.