Related papers: The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

URL: http://arxiv.org/abs/2010.00059v1
Date: Wed, 30 Sep 2020 19:03:35 GMT
Title: The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction
Authors: Andrew McLeod, James Owers, Kazuyoshi Yoshii
Abstract summary: We introduce the MIDI Degradation Toolkit (MDTK), containing functions which take as input a musical excerpt. Using the toolkit, we create the Altered and Corrupted MIDI Excerpts dataset version 1.0. We propose four tasks of increasing difficulty to detect, classify, locate, and correct the degradations.
Score: 14.972219905728963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce the MIDI Degradation Toolkit (MDTK), containing functions which take as input a musical excerpt (a set of notes with pitch, onset time, and duration), and return a "degraded" version of that excerpt with some error (or errors) introduced. Using the toolkit, we create the Altered and Corrupted MIDI Excerpts dataset version 1.0 (ACME v1.0), and propose four tasks of increasing difficulty to detect, classify, locate, and correct the degradations. We hypothesize that models trained for these tasks can be useful in (for example) improving automatic music transcription performance if applied as a post-processing step. To that end, MDTK includes a script that measures the distribution of different types of errors in a transcription, and creates a degraded dataset with similar properties. MDTK's degradations can also be applied dynamically to a dataset during training (with or without the above script), generating novel degraded excerpts each epoch. MDTK could also be used to test the robustness of any system designed to take MIDI (or similar) data as input (e.g. systems designed for voice separation, metrical alignment, or chord detection) to such transcription errors or otherwise noisy data. The toolkit and dataset are both publicly available online, and we encourage contribution and feedback from the community.

Related papers

RUMAA: Repeat-Aware Unified Music Audio Analysis for Score-Performance Alignment, Transcription, and Mistake Detection [17.45655063331199]
RUMAA is a transformer-based framework for music performance analysis.<n>It unifies score-to-performance alignment, score-informed transcription, and mistake detection in a near end-to-end manner.
arXiv Detail & Related papers (2025-07-16T12:13:13Z)
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription [2.3249139042158853]
The Fretting-Transformer is an encoderdecoder model that utilizes a T5 transformer architecture to automate the transcription of MIDI sequences into guitar tablature.<n>By framing the task as a symbolic translation problem, the model addresses key challenges, including string-fret ambiguity and physical playability.
arXiv Detail & Related papers (2025-06-17T06:25:35Z)
Detecting Music Performance Errors with Transformers [3.6837762419929168]
Existing tools for music error detection rely on automatic alignment. There is a lack of sufficient data to train music error detection models. We present a novel data generation technique capable of creating large-scale synthetic music error datasets.
arXiv Detail & Related papers (2025-01-03T07:04:20Z)
Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation. We introduce novel methodologies and datasets to overcome these challenges. We propose MhBART, an encoder-decoder model designed to emulate human writing style. We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z)
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument. This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription. Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z)
Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks. The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z)
Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients. We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z)
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations. We study the impact of labeled data through in-context learning and finetuning. We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z)
Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. The proposed two-stage method uses contrastive learning to pretrain the audio representation model. Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances. We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z)
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction. Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected. Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation [0.0]
The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis. The experiment results show our model ranks $3rd$ in all the $7$ teams in the data challenge in CSMT( 2020)
arXiv Detail & Related papers (2020-10-15T13:59:58Z)
Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset [2.3204178451683264]
Expanded Groove MIDI dataset (E-GMD) contains 444 hours of audio from 43 drum kits. We use E-GMD to optimize classifiers for use in downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite similar results on classification metrics.
arXiv Detail & Related papers (2020-04-01T01:24:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.