Related papers: Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu

URL: http://arxiv.org/abs/2507.04858v1
Date: Mon, 07 Jul 2025 10:32:26 GMT
Title: Towards Human-in-the-Loop Onset Detection: A Transfer Learning Approach for Maracatu
Authors: António Sá Pinto,
Abstract summary: We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition.<n>We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task)<n>Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore transfer learning strategies for musical onset detection in the Afro-Brazilian Maracatu tradition, which features complex rhythmic patterns that challenge conventional models. We adapt two Temporal Convolutional Network architectures: one pre-trained for onset detection (intra-task) and another for beat tracking (inter-task). Using only 5-second annotated snippets per instrument, we fine-tune these models through layer-wise retraining strategies for five traditional percussion instruments. Our results demonstrate significant improvements over baseline performance, with F1 scores reaching up to 0.998 in the intra-task setting and improvements of over 50 percentage points in best-case scenarios. The cross-task adaptation proves particularly effective for time-keeping instruments, where onsets naturally align with beat positions. The optimal fine-tuning configuration varies by instrument, highlighting the importance of instrument-specific adaptation strategies. This approach addresses the challenges of underrepresented musical traditions, offering an efficient human-in-the-loop methodology that minimizes annotation effort while maximizing performance. Our findings contribute to more inclusive music information retrieval tools applicable beyond Western musical contexts.

Related papers

MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z)
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data [100.33096338195723]
We focus on Few-shot Learning with Auxiliary Data (FLAD) FLAD assumes access to auxiliary data during few-shot learning in hopes of improving generalization. We propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with prior FLAD methods that either explore or exploit.
arXiv Detail & Related papers (2023-02-01T18:59:36Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Supervised and Unsupervised Learning of Audio Representations for Music Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z)
Adaptive Few-Shot Learning Algorithm for Rare Sound Event Detection [24.385226516231004]
We propose a novel task-adaptive module which is easy to plant into any metric-based few-shot learning frameworks. Our module improves the performance considerably on two datasets over baseline methods.
arXiv Detail & Related papers (2022-05-24T03:13:12Z)
SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance [88.0355290619761]
This work focuses on the separation of unknown musical instruments. We propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories. Our framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.
arXiv Detail & Related papers (2022-03-25T09:42:11Z)
Deep-Learning Architectures for Multi-Pitch Estimation: Towards Reliable Evaluation [7.599399338954308]
Multi-pitch estimation aims for detecting the simultaneous activity of pitches in polyphonic music recordings. In this paper, we realize different architectures based on CNNs, the U-net structure, and self-attention components. We compare variants of these architectures in different sizes for multi-pitch estimation using the MusicNet and Schubert Winterreise datasets.
arXiv Detail & Related papers (2022-02-18T13:52:21Z)
BERT-like Pre-training for Symbolic Piano Music Classification Tasks [15.02723006489356]
This article presents a benchmark study of symbolic piano music classification using the Bidirectional Representations from Transformers (BERT) approach. We pre-train two 12-layer Transformer models using the BERT approach and fine-tune them for four downstream classification tasks. Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
arXiv Detail & Related papers (2021-07-12T07:03:57Z)
Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z)
Supervised Chorus Detection for Popular Music Using Convolutional Neural Network and Multi-task Learning [10.160205869706965]
This paper presents a novel supervised approach to detecting the chorus segments in popular music. We propose a convolutional neural network with a multi-task learning objective, which simultaneously fits two temporal activation curves. We also propose a post-processing method that jointly takes into account the chorus and boundary predictions to produce binary output.
arXiv Detail & Related papers (2021-03-26T04:32:08Z)
Fast accuracy estimation of deep learning based multi-class musical source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network. Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.