The Common Optical Music Recognition Evaluation Framework
- URL: http://arxiv.org/abs/2312.12908v1
- Date: Wed, 20 Dec 2023 10:45:22 GMT
- Title: The Common Optical Music Recognition Evaluation Framework
- Authors: Pau Torras and Sanket Biswas and Alicia Forn\'es
- Abstract summary: There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds.
We propose the Music Tree Notation (MTN) format, which represents music as a set of primitives that group together into higher-abstraction nodes.
We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
- Score: 2.4171019220503402
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The quality of Optical Music Recognition (OMR) systems is a rather difficult
magnitude to measure. There is no lingua franca shared among OMR datasets that
allows to compare systems' performance on equal grounds, since most of them are
specialised on certain approaches. As a result, most state-of-the-art works
currently report metrics that cannot be compared directly. In this paper we
identify the need of a common music representation language and propose the
Music Tree Notation (MTN) format, thanks to which the definition of standard
metrics is possible. This format represents music as a set of primitives that
group together into higher-abstraction nodes, a compromise between the
expression of fully graph-based and sequential notation formats. We have also
developed a specific set of OMR metrics and a typeset score dataset as a proof
of concept of this idea.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Practical End-to-End Optical Music Recognition for Pianoform Music [3.69298824193862]
We define a sequential format called Linearized MusicXML, allowing to train an end-to-end model directly.
We create a benchmarking typeset OMR with MusicXML ground truth based on the OpenScore Lieder corpus.
We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model.
arXiv Detail & Related papers (2024-03-20T17:26:22Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - And what if two musical versions don't share melody, harmony, rhythm, or
lyrics ? [2.4366811507669124]
We show that an approximated representation of the lyrics is an efficient proxy to discriminate between versions and non-versions.
We then describe how these features complement each other and yield new state-of-the-art performances on two publicly available datasets.
arXiv Detail & Related papers (2022-10-03T22:33:14Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - Towards Context-Aware Neural Performance-Score Synchronisation [2.0305676256390934]
Music synchronisation provides a way to navigate among multiple representations of music in a unified manner.
Traditional synchronisation methods compute alignment using knowledge-driven and performance analysis approaches.
This PhD furthers the development of performance-score synchronisation research by proposing data-driven, context-aware alignment approaches.
arXiv Detail & Related papers (2022-05-31T16:45:25Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Optical Music Recognition: State of the Art and Major Challenges [0.0]
Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format.
The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet.
Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach.
arXiv Detail & Related papers (2020-06-14T12:40:17Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.