Related papers: The Common Optical Music Recognition Evaluation Framework

The Common Optical Music Recognition Evaluation Framework

URL: http://arxiv.org/abs/2312.12908v1
Date: Wed, 20 Dec 2023 10:45:22 GMT
Title: The Common Optical Music Recognition Evaluation Framework
Authors: Pau Torras and Sanket Biswas and Alicia Forn\'es
Abstract summary: There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds. We propose the Music Tree Notation (MTN) format, which represents music as a set of primitives that group together into higher-abstraction nodes. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
Score: 2.4171019220503402
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The quality of Optical Music Recognition (OMR) systems is a rather difficult magnitude to measure. There is no lingua franca shared among OMR datasets that allows to compare systems' performance on equal grounds, since most of them are specialised on certain approaches. As a result, most state-of-the-art works currently report metrics that cannot be compared directly. In this paper we identify the need of a common music representation language and propose the Music Tree Notation (MTN) format, thanks to which the definition of standard metrics is possible. This format represents music as a set of primitives that group together into higher-abstraction nodes, a compromise between the expression of fully graph-based and sequential notation formats. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.

Related papers

Sheet Music Benchmark: Standardized Optical Music Recognition Evaluation [6.2874069879764]
Sheet Music Benchmark (SMB) is a dataset of six hundred and eighty-five pages specifically designed to benchmark Optical Music Recognition (OMR) research.<n>OMR Normalized Edit Distance (OMR-NED) is a new metric tailored explicitly for evaluating OMR performance.
arXiv Detail & Related papers (2025-06-12T08:42:19Z)
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries [53.30852012059025]
Music source separation is an audio-to-audio retrieval task. Recent work in music source separation has begun to challenge the fixed-stem paradigm. We propose the use of hyperellipsoidal regions as queries to allow for an intuitive yet easily parametrizable approach to specifying both the target (location) and its spread.
arXiv Detail & Related papers (2025-01-27T16:13:50Z)
Aligned Music Notation and Lyrics Transcription [8.411893399638046]
This paper introduces and formalizes the Aligned Music Notation and Lyrics Transcription (AMNLT) challenge. It addresses the complete transcription of vocal scores by jointly considering music symbols, lyrics, and their synchronization. We evaluate different approaches to address this challenge, ranging from traditional divide-and-conquer methods that handle music and lyrics separately, to novel end-to-end solutions.
arXiv Detail & Related papers (2024-12-05T14:50:11Z)
Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image. We introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z)
Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
Practical End-to-End Optical Music Recognition for Pianoform Music [3.69298824193862]
We define a sequential format called Linearized MusicXML, allowing to train an end-to-end model directly. We create a benchmarking typeset OMR with MusicXML ground truth based on the OpenScore Lieder corpus. We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model.
arXiv Detail & Related papers (2024-03-20T17:26:22Z)
TrOMR:Transformer-Based Polyphonic Optical Music Recognition [26.14383240933706]
We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
arXiv Detail & Related papers (2023-08-18T08:06:27Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
Late multimodal fusion for image and audio music transcription [0.0]
multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. We study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems. Two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
arXiv Detail & Related papers (2022-04-06T20:00:33Z)
Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z)
Contrastive Learning with Positive-Negative Frame Mask for Music Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR. We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z)
Sequence Generation using Deep Recurrent Networks and Embeddings: A study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition. A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.