DoReMi: First glance at a universal OMR dataset
- URL: http://arxiv.org/abs/2107.07786v1
- Date: Fri, 16 Jul 2021 09:24:58 GMT
- Title: DoReMi: First glance at a universal OMR dataset
- Authors: Elona Shatri and Gy\"orgy Fazekas
- Abstract summary: DoReMi is an OMR dataset that addresses the main challenges of OMR.
It includes over 6400 printed sheet music images with accompanying metadata.
We obtain 64% mean average precision (mAP) in object detection using half of the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The main challenges of Optical Music Recognition (OMR) come from the nature
of written music, its complexity and the difficulty of finding an appropriate
data representation. This paper provides a first look at DoReMi, an OMR dataset
that addresses these challenges, and a baseline object detection model to
assess its utility. Researchers often approach OMR following a set of small
stages, given that existing data often do not satisfy broader research. We
examine the possibility of changing this tendency by presenting more metadata.
Our approach complements existing research; hence DoReMi allows harmonisation
with two existing datasets, DeepScores and MUSCIMA++. DoReMi was generated
using a music notation software and includes over 6400 printed sheet music
images with accompanying metadata useful in OMR research. Our dataset provides
OMR metadata, MIDI, MEI, MusicXML and PNG files, each aiding a different stage
of OMR. We obtain 64% mean average precision (mAP) in object detection using
half of the data. Further work includes re-iterating through the creation
process to satisfy custom OMR models. While we do not assume to have solved the
main challenges in OMR, this dataset opens a new course of discussions that
would ultimately aid that goal.
Related papers
- Practical End-to-End Optical Music Recognition for Pianoform Music [3.69298824193862]
We define a sequential format called Linearized MusicXML, allowing to train an end-to-end model directly.
We create a benchmarking typeset OMR with MusicXML ground truth based on the OpenScore Lieder corpus.
We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model.
arXiv Detail & Related papers (2024-03-20T17:26:22Z) - Can we predict the Most Replayed data of video streaming platforms? [57.55927378696826]
We explore whether it is possible to predict the Most Replayed (MR) data from YouTube videos.
To this end, we curate a large video benchmark, the YTMR500 dataset, which comprises 500 YouTube videos with MR data annotations.
We evaluate Deep Learning (DL) models of varying complexity on our dataset and perform an extensive ablation study.
arXiv Detail & Related papers (2023-09-12T10:08:33Z) - Source-Free Collaborative Domain Adaptation via Multi-Perspective
Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis.
Many methods have been proposed to reduce fMRI heterogeneity between source and target domains.
But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies.
We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z) - TrOMR:Transformer-Based Polyphonic Optical Music Recognition [26.14383240933706]
We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR.
We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
arXiv Detail & Related papers (2023-08-18T08:06:27Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - An AMR-based Link Prediction Approach for Document-level Event Argument
Extraction [51.77733454436013]
Recent works have introduced Abstract Meaning Representation (AMR) for Document-level Event Argument Extraction (Doc-level EAE)
This work reformulates EAE as a link prediction problem on AMR graphs.
We propose a novel graph structure, Tailored AMR Graph (TAG), which compresses less informative subgraphs and edge types, integrates span information, and highlights surrounding events in the same document.
arXiv Detail & Related papers (2023-05-30T16:07:48Z) - MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition [73.80088682784587]
"Multimodal Generalization" (MMG) aims to study how systems can generalize when data from certain modalities is limited or even completely missing.
MMG consists of two novel scenarios, designed to support security, and efficiency considerations in real-world applications.
New fusion module with modality dropout training, contrastive-based alignment training, and a novel cross-modal loss for better few-shot performance.
arXiv Detail & Related papers (2023-05-12T03:05:40Z) - Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A
Reproducibility Study [55.964387734180114]
Cross-modal retrieval (CMR) approaches usually focus on object-centric datasets.
This paper focuses on the results and their generalizability across different dataset types.
We select two state-of-the-art CMR models with different architectures.
We determine the relative performance of the selected models on these datasets.
arXiv Detail & Related papers (2023-01-12T18:00:00Z) - Late multimodal fusion for image and audio music transcription [0.0]
multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities.
We study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems.
Two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
arXiv Detail & Related papers (2022-04-06T20:00:33Z) - A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and
Benchmark Datasets [5.54205518616467]
Machine Reading (MRC) is a challenging Natural Language Processing(NLP) research field with wide real-world applications.
A lot of MRC models have already surpassed human performance on various benchmark datasets.
This shows the need for improving existing datasets, evaluation metrics, and models to move current MRC models toward "real" understanding.
arXiv Detail & Related papers (2020-06-21T19:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.