Optical Music Recognition: State of the Art and Major Challenges
- URL: http://arxiv.org/abs/2006.07885v2
- Date: Mon, 22 Jun 2020 16:33:59 GMT
- Title: Optical Music Recognition: State of the Art and Major Challenges
- Authors: Elona Shatri and Gy\"orgy Fazekas
- Abstract summary: Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format.
The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet.
Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optical Music Recognition (OMR) is concerned with transcribing sheet music
into a machine-readable format. The transcribed copy should allow musicians to
compose, play and edit music by taking a picture of a music sheet. Complete
transcription of sheet music would also enable more efficient archival. OMR
facilitates examining sheet music statistically or searching for patterns of
notations, thus helping use cases in digital musicology too. Recently, there
has been a shift in OMR from using conventional computer vision techniques
towards a deep learning approach. In this paper, we review relevant works in
OMR, including fundamental methods and significant outcomes, and highlight
different stages of the OMR pipeline. These stages often lack standard input
and output representation and standardised evaluation. Therefore, comparing
different approaches and evaluating the impact of different processing methods
can become rather complex. This paper provides recommendations for future work,
addressing some of the highlighted issues and represents a position in
furthering this important field of research.
Related papers
- Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI.
This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z) - Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections.
Our work employs explainable descriptors for difficulty estimation in symbolic music representations.
Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition.
Our system processes an entire music score page and outputs a complete transcription in a music encoding format.
The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Natural Language Processing Methods for Symbolic Music Generation and
Information Retrieval: a Survey [6.416887247454113]
This survey reviews NLP methods applied to symbolic music generation and information retrieval studies.
We first propose an overview of representations of symbolic music adapted from natural language sequential representations.
We describe these models, in particular deep learning models, through different prisms, highlighting music-specialized mechanisms.
arXiv Detail & Related papers (2024-02-27T12:48:01Z) - Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription [13.960714900433269]
Sheet Music Transformer is the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies.
Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively.
arXiv Detail & Related papers (2024-02-12T11:52:21Z) - Late multimodal fusion for image and audio music transcription [0.0]
multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities.
We study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems.
Two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.
arXiv Detail & Related papers (2022-04-06T20:00:33Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.