Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems
- URL: http://arxiv.org/abs/2408.04737v1
- Date: Thu, 8 Aug 2024 19:40:28 GMT
- Title: Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems
- Authors: Lukáš Samuel Marták, Patricia Hu, Gerhard Widmer,
- Abstract summary: Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music.
We identify two primary sources of distribution shift: the music, and the sound.
We evaluate the performance of several SotA AMT systems on two new experimental test sets.
- Score: 3.5570874721859016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. The State-of-the-Art (SotA) benchmarks have been dominated by deep learning systems. Due to the scarcity of high quality data, they are usually trained and evaluated exclusively or predominantly on classical piano music. Unfortunately, that hinders our ability to understand how they generalize to other music. Previous works have revealed several aspects of memorization and overfitting in these systems. We identify two primary sources of distribution shift: the music, and the sound. Complementing recent results on the sound axis (i.e. acoustics, timbre), we investigate the musical one (i.e. note combinations, dynamics, genre). We evaluate the performance of several SotA AMT systems on two new experimental test sets which we carefully construct to emulate different levels of musical distribution shift. Our results reveal a stark performance gap, shedding further light on the Corpus Bias problem, and the extent to which it continues to trouble these systems.
Related papers
- Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections.
Our work employs explainable descriptors for difficulty estimation in symbolic music representations.
Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z) - Cluster and Separate: a GNN Approach to Voice and Staff Prediction for Score Engraving [5.572472212662453]
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves.
We propose an end-to-end system based on graph neural networks that notes that belong to the same chord and connect them with edges if they are part of a voice.
arXiv Detail & Related papers (2024-07-15T14:36:13Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - A Perceptual Measure for Evaluating the Resynthesis of Automatic Music
Transcriptions [10.957528713294874]
This study focuses on the perception of music performances when contextual factors, such as room acoustics and instrument, change.
We propose to distinguish the concept of "performance" from the one of "interpretation", which expresses the "artistic intention"
arXiv Detail & Related papers (2022-02-24T18:09:22Z) - Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach [74.09517278785519]
We introduce a grading function that evaluates four-part chorales in the style of J.S. Bach along important musical features.
We show that the function is both interpretable and outperforms human experts at discriminating Bach chorales from model-generated ones.
arXiv Detail & Related papers (2020-06-23T21:02:55Z) - Optical Music Recognition: State of the Art and Major Challenges [0.0]
Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format.
The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet.
Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach.
arXiv Detail & Related papers (2020-06-14T12:40:17Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.