Related papers: Detecting Music Performance Errors with Transformers

Detecting Music Performance Errors with Transformers

URL: http://arxiv.org/abs/2501.02030v1
Date: Fri, 03 Jan 2025 07:04:20 GMT
Title: Detecting Music Performance Errors with Transformers
Authors: Benjamin Shiue-Hal Chou, Purvish Jajal, Nicholas John Eliopoulos, Tim Nadolsky, Cheng-Yun Yang, Nikita Ravi, James C. Davis, Kristen Yeon-Ji Yun, Yung-Hsiang Lu,
Abstract summary: Existing tools for music error detection rely on automatic alignment.<n>There is a lack of sufficient data to train music error detection models.<n>We present a novel data generation technique capable of creating large-scale synthetic music error datasets.
Score: 3.6837762419929168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Beginner musicians often struggle to identify specific errors in their performances, such as playing incorrect notes or rhythms. There are two limitations in existing tools for music error detection: (1) Existing approaches rely on automatic alignment; therefore, they are prone to errors caused by small deviations between alignment targets.; (2) There is a lack of sufficient data to train music error detection models, resulting in over-reliance on heuristics. To address (1), we propose a novel transformer model, Polytune, that takes audio inputs and outputs annotated music scores. This model can be trained end-to-end to implicitly align and compare performance audio with music scores through latent space representations. To address (2), we present a novel data generation technique capable of creating large-scale synthetic music error datasets. Our approach achieves a 64.1% average Error Detection F1 score, improving upon prior work by 40 percentage points across 14 instruments. Additionally, compared with existing transcription methods repurposed for music error detection, our model can handle multiple instruments. Our source code and datasets are available at https://github.com/ben2002chou/Polytune.

Related papers

LadderSym: A Multimodal Interleaved Transformer for Music Practice Error Detection [6.949059287049708]
This paper introduces textitLadderSym, a novel Transformer-based method for music error detection.<n>textitLadderSym is guided by two key observations about the state-of-the-art approaches.<n>We evaluate our method on the textitMAESTRO-E and textitCocoChorales-E datasets by measuring the F1 score for each note category.
arXiv Detail & Related papers (2025-09-16T02:15:06Z)
Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation [49.062766449989525]
Generative models of music audio are typically used to generate output based solely on a text prompt or melody.<n>Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model.
arXiv Detail & Related papers (2025-07-07T10:46:07Z)
Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes [9.283206048560322]
Moonbeam is a transformer-based foundation model for symbolic music.<n>It is pretrained on a large and diverse collection of MIDI data totaling 81.6K hours of music and 18 billion tokens.<n>We open-source the code, pretrained model, and generated samples on Github.
arXiv Detail & Related papers (2025-05-21T14:17:25Z)
Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image. We introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z)
Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Learning correlated noise in a 39-qubit quantum processor [0.38073142980732994]
Building error-corrected quantum computers relies crucially on measuring and modeling noise on candidate devices. Here we propose a method of extracting detailed information of the noise in a device running syndrome extraction circuits. We show how to extract from the 20 data qubits the information needed to build noise models of various sophistication.
arXiv Detail & Related papers (2023-03-01T19:07:35Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures. With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z)
Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length. Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z)
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z)
Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances. We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z)
Real-time error correction and performance aid for MIDI instruments [0.0]
Making a slight mistake during live music performance can easily be spotted by an astute listener. The problem of identifying and correcting such errors can be approached with artificial intelligence. This paper examines state-of-the-art solutions to related problems and explores novel solutions for music error detection and correction.
arXiv Detail & Related papers (2020-11-26T04:28:29Z)
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation [0.0]
The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis. The experiment results show our model ranks $3rd$ in all the $7$ teams in the data challenge in CSMT( 2020)
arXiv Detail & Related papers (2020-10-15T13:59:58Z)
Audio Impairment Recognition Using a Correlation-Based Feature Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs. We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.