Related papers: Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano

Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano

URL: http://arxiv.org/abs/2410.23325v1
Date: Wed, 30 Oct 2024 13:17:13 GMT
Title: Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano
Authors: Zhenyi Hou, Xu Zhao, Kejie Ye, Xinyu Sheng, Shanggerile Jiang, Jiajing Xia, Yitao Zhang, Chenxi Ban, Daijun Luo, Jiaxing Chen, Yan Zou, Yuchao Feng, Guangyu Fan, Xin Yuan,
Abstract summary: We present a novel approach to evaluating Mezzo-soprano vocal techniques using deep learning models. We employ deep learning models pre-trained on the ImageNet and Urbansound8k datasets. Our experimental results indicate that transfer learning increases the overall accuracy (OAcc) of all models by an average of 8.3%.
Score: 13.796982484176207
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vocal education in the music field is difficult to quantify due to the individual differences in singers' voices and the different quantitative criteria of singing techniques. Deep learning has great potential to be applied in music education due to its efficiency to handle complex data and perform quantitative analysis. However, accurate evaluations with limited samples over rare vocal types, such as Mezzo-soprano, requires extensive well-annotated data support using deep learning models. In order to attain the objective, we perform transfer learning by employing deep learning models pre-trained on the ImageNet and Urbansound8k datasets for the improvement on the precision of vocal technique evaluation. Furthermore, we tackle the problem of the lack of samples by constructing a dedicated dataset, the Mezzo-soprano Vocal Set (MVS), for vocal technique assessment. Our experimental results indicate that transfer learning increases the overall accuracy (OAcc) of all models by an average of 8.3%, with the highest accuracy at 94.2%. We not only provide a novel approach to evaluating Mezzo-soprano vocal techniques but also introduce a new quantitative assessment method for music education.

Related papers

Machine Learning Approaches to Vocal Register Classification in Contemporary Male Pop Music [49.1574468325115]
In pop music, where a single artist may use a variety of timbre's and textures to achieve a desired quality, it can be difficult to identify what vocal register within the vocal range a singer is using.<n>This paper presents two methods for classifying vocal registers in an audio signal of male pop music through the analysis of textural features of mel-spectrogram images.
arXiv Detail & Related papers (2025-05-16T15:41:28Z)
Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections. Our work employs explainable descriptors for difficulty estimation in symbolic music representations. Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Singer Identity Representation Learning using Self-Supervised Techniques [0.0]
We propose a framework for training singer identity encoders to extract representations suitable for various singing-related tasks. We explore different self-supervised learning techniques on a large collection of isolated vocal tracks. We evaluate the quality of the resulting representations on singer similarity and identification tasks.
arXiv Detail & Related papers (2024-01-10T10:41:38Z)
Investigating Personalization Methods in Text to Music Generation [21.71190700761388]
Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. For evaluation, we construct a novel dataset with prompts and music clips. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody.
arXiv Detail & Related papers (2023-09-20T08:36:34Z)
Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training [82.94349771571642]
This work proposes a melody-unsupervised multi-speaker pre-training method to enhance the vocal range of the single-speaker. It is the first to introduce a differentiable duration regulator to improve the rhythm naturalness of the synthesized voice. Experimental results verify that the proposed SVS system outperforms the baseline on both sound quality and naturalness.
arXiv Detail & Related papers (2023-09-01T06:40:41Z)
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies [1.2691047660244337]
Self-supervised learning models (SSL models) have been trained using large amounts of unlabeled data in the field of speech processing and music classification. We report the results of experiments comparing SSL models for three different tasks (i.e., singer identification, singing voice transcription, and singing technique classification) as initial exploration and aim to discuss these findings.
arXiv Detail & Related papers (2023-06-22T07:47:18Z)
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z)
Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models [95.97506031821217]
We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training. The method requires a short (3 seconds) sample from the target person, and generation is steered at inference time, without any training steps.
arXiv Detail & Related papers (2022-06-05T19:45:29Z)
Deep Learning Approach for Singer Voice Classification of Vietnamese Popular Music [1.2043574473965315]
We propose a new method to identify the singer's name based on analysis of Vietnamese popular music. We employ the use of vocal segment detection and singing voice separation as the pre-processing steps. To verify the accuracy of our methods, we evaluate on a dataset of 300 Vietnamese songs from 18 famous singers.
arXiv Detail & Related papers (2021-02-24T08:03:07Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.