A Data-Driven Analysis of Robust Automatic Piano Transcription
- URL: http://arxiv.org/abs/2402.01424v1
- Date: Fri, 2 Feb 2024 14:11:23 GMT
- Title: A Data-Driven Analysis of Robust Automatic Piano Transcription
- Authors: Drew Edwards, Simon Dixon, Emmanouil Benetos, Akira Maezawa, Yuta
Kusaka
- Abstract summary: Recent developments have focused on adapting new neural network architectures to yield more accurate systems.
We show how these models can severely overfit to acoustic properties of the training data.
We achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS dataset, without seeing any of its training data.
- Score: 16.686703489636734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithms for automatic piano transcription have improved dramatically in
recent years due to new datasets and modeling techniques. Recent developments
have focused primarily on adapting new neural network architectures, such as
the Transformer and Perceiver, in order to yield more accurate systems. In this
work, we study transcription systems from the perspective of their training
data. By measuring their performance on out-of-distribution annotated piano
data, we show how these models can severely overfit to acoustic properties of
the training data. We create a new set of audio for the MAESTRO dataset,
captured automatically in a professional studio recording environment via
Yamaha Disklavier playback. Using various data augmentation techniques when
training with the original and re-performed versions of the MAESTRO dataset, we
achieve state-of-the-art note-onset accuracy of 88.4 F1-score on the MAPS
dataset, without seeing any of its training data. We subsequently analyze these
data augmentation techniques in a series of ablation studies to better
understand their influence on the resulting models.
Related papers
- MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis [35.07663680944459]
Deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks.
The success of deep learning is all attributed to the training on large-scale datasets.
In order to solve the problem of large amount of data, some researchers put forward the method of data distillation.
arXiv Detail & Related papers (2024-08-05T14:16:54Z) - Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models [7.928003786376716]
We propose novel architectures for convolutional recurrent neural networks.
We improve note-state sequence modeling by using a pitchwise LSTM.
We show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset.
arXiv Detail & Related papers (2024-04-10T08:06:15Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Simulation-Enhanced Data Augmentation for Machine Learning Pathloss
Prediction [9.664420734674088]
This paper introduces a novel simulation-enhanced data augmentation method for machine learning pathloss prediction.
Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets.
The integration of synthetic data significantly improves the generalizability of the model in different environments.
arXiv Detail & Related papers (2024-02-03T00:38:08Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - MADS: Modulated Auto-Decoding SIREN for time series imputation [9.673093148930874]
We propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations.
We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation.
arXiv Detail & Related papers (2023-07-03T09:08:47Z) - Towards Understanding How Data Augmentation Works with Imbalanced Data [17.478900028887537]
We study the effect of data augmentation on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models.
Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection.
We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels.
arXiv Detail & Related papers (2023-04-12T15:01:22Z) - Convolutional Neural Networks for the classification of glitches in
gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors.
We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset.
We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z) - Data Scaling Laws in NMT: The Effect of Noise and Architecture [59.767899982937756]
We study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT)
We find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data.
arXiv Detail & Related papers (2022-02-04T06:53:49Z) - Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation [79.47771259100674]
We present two sample-adaptive automatic weighting schemes for data augmentation.
We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive.
On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
arXiv Detail & Related papers (2021-02-16T17:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.