DoReMi: First glance at a universal OMR dataset
- URL: http://arxiv.org/abs/2107.07786v1
- Date: Fri, 16 Jul 2021 09:24:58 GMT
- Title: DoReMi: First glance at a universal OMR dataset
- Authors: Elona Shatri and Gy\"orgy Fazekas
- Abstract summary: DoReMi is an OMR dataset that addresses the main challenges of OMR.
It includes over 6400 printed sheet music images with accompanying metadata.
We obtain 64% mean average precision (mAP) in object detection using half of the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The main challenges of Optical Music Recognition (OMR) come from the nature
of written music, its complexity and the difficulty of finding an appropriate
data representation. This paper provides a first look at DoReMi, an OMR dataset
that addresses these challenges, and a baseline object detection model to
assess its utility. Researchers often approach OMR following a set of small
stages, given that existing data often do not satisfy broader research. We
examine the possibility of changing this tendency by presenting more metadata.
Our approach complements existing research; hence DoReMi allows harmonisation
with two existing datasets, DeepScores and MUSCIMA++. DoReMi was generated
using a music notation software and includes over 6400 printed sheet music
images with accompanying metadata useful in OMR research. Our dataset provides
OMR metadata, MIDI, MEI, MusicXML and PNG files, each aiding a different stage
of OMR. We obtain 64% mean average precision (mAP) in object detection using
half of the data. Further work includes re-iterating through the creation
process to satisfy custom OMR models. While we do not assume to have solved the
main challenges in OMR, this dataset opens a new course of discussions that
would ultimately aid that goal.
Related papers
- Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition.
Our system processes an entire music score page and outputs a complete transcription in a music encoding format.
The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z) - A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems [4.936226952764696]
We identify the need for a common music representation language and propose the Music Tree Notation (MTN) format.
This format represents music as a set of primitives that group together into higher-abstraction nodes.
We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
arXiv Detail & Related papers (2023-12-20T10:45:22Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - TrOMR:Transformer-Based Polyphonic Optical Music Recognition [26.14383240933706]
We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR.
We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
arXiv Detail & Related papers (2023-08-18T08:06:27Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - An AMR-based Link Prediction Approach for Document-level Event Argument
Extraction [51.77733454436013]
Recent works have introduced Abstract Meaning Representation (AMR) for Document-level Event Argument Extraction (Doc-level EAE)
This work reformulates EAE as a link prediction problem on AMR graphs.
We propose a novel graph structure, Tailored AMR Graph (TAG), which compresses less informative subgraphs and edge types, integrates span information, and highlights surrounding events in the same document.
arXiv Detail & Related papers (2023-05-30T16:07:48Z) - MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition [73.80088682784587]
"Multimodal Generalization" (MMG) aims to study how systems can generalize when data from certain modalities is limited or even completely missing.
MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications.
New fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal loss for better few-shot performance.
arXiv Detail & Related papers (2023-05-12T03:05:40Z) - Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
Reproducibility Study [55.964387734180114]
Cross-modal retrieval (CMR) approaches usually focus on object-centric datasets.
This paper focuses on the results and their generalizability across different dataset types.
We select two state-of-the-art CMR models with different architectures.
We determine the relative performance of the selected models on these datasets.
arXiv Detail & Related papers (2023-01-12T18:00:00Z) - Late multimodal fusion for image and audio music transcription [0.0]
multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities.
We study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems.
Two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
arXiv Detail & Related papers (2022-04-06T20:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.