Related papers: Recognizing Ornaments in Vocal Indian Art Music with Active Annotation

Recognizing Ornaments in Vocal Indian Art Music with Active Annotation

URL: http://arxiv.org/abs/2505.04419v1
Date: Wed, 07 May 2025 13:52:50 GMT
Title: Recognizing Ornaments in Vocal Indian Art Music with Active Annotation
Authors: Sumit Kumar, Parampreet Singh, Vipul Arora,
Abstract summary: We introduce R=aga Ornamentation Detection (ROD), a novel dataset comprising Indian classical music recordings curated by expert musicians.<n>The dataset is annotated using a custom Human-in-the-Loop tool for six vocal ornaments marked as event-based labels.<n>We develop an ornamentation detection model based on deep time-series analysis, preserving ornament boundaries during the chunking of long audio recordings.
Score: 2.9219095364935885
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ornamentations, embellishments, or microtonal inflections are essential to melodic expression across many musical traditions, adding depth, nuance, and emotional impact to performances. Recognizing ornamentations in singing voices is key to MIR, with potential applications in music pedagogy, singer identification, genre classification, and controlled singing voice generation. However, the lack of annotated datasets and specialized modeling approaches remains a major obstacle for progress in this research area. In this work, we introduce R\=aga Ornamentation Detection (ROD), a novel dataset comprising Indian classical music recordings curated by expert musicians. The dataset is annotated using a custom Human-in-the-Loop tool for six vocal ornaments marked as event-based labels. Using this dataset, we develop an ornamentation detection model based on deep time-series analysis, preserving ornament boundaries during the chunking of long audio recordings. We conduct experiments using different train-test configurations within the ROD dataset and also evaluate our approach on a separate, manually annotated dataset of Indian classical concert recordings. Our experimental results support the superior performance of our proposed approach over the baseline CRNN.

Related papers

Predicting Music Track Popularity by Convolutional Neural Networks on Spotify Features and Spectrogram of Audio Waveform [3.6458439734112695]
This study introduces a pioneering methodology that uses Convolutional Neural Networks (CNNs) and Spotify data analysis to forecast the popularity of music tracks.<n>Our approach takes advantage of Spotify's wide range of features, including acoustic attributes based on the spectrogram of audio waveform, metadata, and user engagement metrics.<n>Using a large dataset covering various genres and demographics, our CNN-based model shows impressive effectiveness in predicting the popularity of music tracks.
arXiv Detail & Related papers (2025-05-12T07:03:17Z)
Automatic Estimation of Singing Voice Musical Dynamics [9.343063100314687]
We propose a methodology for dataset curation. We compile a dataset comprising 509 musical dynamics annotated singing voice performances, aligned with 163 score files. We train a CNN model with varying window sizes to evaluate the effectiveness of estimating musical dynamics. We conclude through our experiments that bark-scale based features outperform log-Mel-features for the task of singing voice dynamics prediction.
arXiv Detail & Related papers (2024-10-27T18:15:18Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Perceptual Musical Features for Interpretable Audio Tagging [2.1730712607705485]
This study explores the relevance of interpretability in the context of automatic music tagging. We constructed a workflow that incorporates three different information extraction techniques. We conducted experiments on two datasets, namely the MTG-Jamendo dataset and the GTZAN dataset.
arXiv Detail & Related papers (2023-12-18T14:31:58Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types. We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input. In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z)
A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription [11.951441023641975]
We propose a method of finding note onsets of singing voice more accurately by leveraging the linguistic characteristics of singing. Our approach substantially improves the performance of singing transcription and emphasizes the importance of linguistic features in singing analysis.
arXiv Detail & Related papers (2023-04-12T15:36:01Z)
A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data. The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Musical Prosody-Driven Emotion Classification: Interpreting Vocalists Portrayal of Emotions Through Machine Learning [0.0]
The role of musical prosody remains under-explored despite several studies demonstrating a strong connection between prosody and emotion. In this study, we restrict the input of traditional machine learning algorithms to the features of musical prosody. We utilize a methodology for individual data collection from vocalists, and personal ground truth labeling by the artist themselves.
arXiv Detail & Related papers (2021-06-04T15:40:19Z)
Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective. The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone. The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.