TrOMR:Transformer-Based Polyphonic Optical Music Recognition
- URL: http://arxiv.org/abs/2308.09370v1
- Date: Fri, 18 Aug 2023 08:06:27 GMT
- Title: TrOMR:Transformer-Based Polyphonic Optical Music Recognition
- Authors: Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li
- Abstract summary: We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR.
We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
- Score: 26.14383240933706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Music Recognition (OMR) is an important technology in music and has
been researched for a long time. Previous approaches for OMR are usually based
on CNN for image understanding and RNN for music symbol classification. In this
paper, we propose a transformer-based approach with excellent global perceptual
capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a
novel consistency loss function and a reasonable approach for data annotation
to improve recognition accuracy for complex music scores. Extensive experiments
demonstrate that TrOMR outperforms current OMR methods, especially in
real-world scenarios. We also develop a TrOMR system and build a camera scene
dataset for full-page music scores in real-world. The code and datasets will be
made available for reproducibility.
Related papers
- Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI.
This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z) - End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition.
Our system processes an entire music score page and outputs a complete transcription in a music encoding format.
The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z) - Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription [13.960714900433269]
Sheet Music Transformer is the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies.
Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively.
arXiv Detail & Related papers (2024-02-12T11:52:21Z) - A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems [4.936226952764696]
We identify the need for a common music representation language and propose the Music Tree Notation (MTN) format.
This format represents music as a set of primitives that group together into higher-abstraction nodes.
We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
arXiv Detail & Related papers (2023-12-20T10:45:22Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction.
It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data.
We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z) - Specificity-Preserving Federated Learning for MR Image Reconstruction [94.58912814426122]
Federated learning can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction.
Recent FL techniques tend to solve this by enhancing the generalization of the global model.
We propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI)
arXiv Detail & Related papers (2021-12-09T22:13:35Z) - DoReMi: First glance at a universal OMR dataset [0.0]
DoReMi is an OMR dataset that addresses the main challenges of OMR.
It includes over 6400 printed sheet music images with accompanying metadata.
We obtain 64% mean average precision (mAP) in object detection using half of the data.
arXiv Detail & Related papers (2021-07-16T09:24:58Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.