The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction
- URL: http://arxiv.org/abs/2010.00059v1
- Date: Wed, 30 Sep 2020 19:03:35 GMT
- Title: The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction
- Authors: Andrew McLeod, James Owers, Kazuyoshi Yoshii
- Abstract summary: We introduce the MIDI Degradation Toolkit (MDTK), containing functions which take as input a musical excerpt.
Using the toolkit, we create the Altered and Corrupted MIDI Excerpts dataset version 1.0.
We propose four tasks of increasing difficulty to detect, classify, locate, and correct the degradations.
- Score: 14.972219905728963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce the MIDI Degradation Toolkit (MDTK), containing
functions which take as input a musical excerpt (a set of notes with pitch,
onset time, and duration), and return a "degraded" version of that excerpt with
some error (or errors) introduced. Using the toolkit, we create the Altered and
Corrupted MIDI Excerpts dataset version 1.0 (ACME v1.0), and propose four tasks
of increasing difficulty to detect, classify, locate, and correct the
degradations. We hypothesize that models trained for these tasks can be useful
in (for example) improving automatic music transcription performance if applied
as a post-processing step. To that end, MDTK includes a script that measures
the distribution of different types of errors in a transcription, and creates a
degraded dataset with similar properties. MDTK's degradations can also be
applied dynamically to a dataset during training (with or without the above
script), generating novel degraded excerpts each epoch. MDTK could also be used
to test the robustness of any system designed to take MIDI (or similar) data as
input (e.g. systems designed for voice separation, metrical alignment, or chord
detection) to such transcription errors or otherwise noisy data. The toolkit
and dataset are both publicly available online, and we encourage contribution
and feedback from the community.
Related papers
- Detecting Music Performance Errors with Transformers [3.6837762419929168]
Existing tools for music error detection rely on automatic alignment.
There is a lack of sufficient data to train music error detection models.
We present a novel data generation technique capable of creating large-scale synthetic music error datasets.
arXiv Detail & Related papers (2025-01-03T07:04:20Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.
We introduce novel methodologies and datasets to overcome these challenges.
We propose MhBART, an encoder-decoder model designed to emulate human writing style.
We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument.
This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription.
Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese
Grammatical Error Correction [49.25830718574892]
We present a new framework named Tail-to-Tail (textbfTtT) non-autoregressive sequence prediction.
Considering that most tokens are correct and can be conveyed directly from source to target, and the error positions can be estimated and corrected.
Experimental results on standard datasets, especially on the variable-length datasets, demonstrate the effectiveness of TtT in terms of sentence-level Accuracy, Precision, Recall, and F1-Measure.
arXiv Detail & Related papers (2021-06-03T05:56:57Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Improving Perceptual Quality of Drum Transcription with the Expanded
Groove MIDI Dataset [2.3204178451683264]
Expanded Groove MIDI dataset (E-GMD) contains 444 hours of audio from 43 drum kits.
We use E-GMD to optimize classifiers for use in downstream generation by predicting expressive dynamics (velocity) and show with listening tests that they produce outputs with improved perceptual quality, despite similar results on classification metrics.
arXiv Detail & Related papers (2020-04-01T01:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.