Related papers: TrOMR:Transformer-Based Polyphonic Optical Music Recognition

TrOMR:Transformer-Based Polyphonic Optical Music Recognition

URL: http://arxiv.org/abs/2308.09370v1
Date: Fri, 18 Aug 2023 08:06:27 GMT
Title: TrOMR:Transformer-Based Polyphonic Optical Music Recognition
Authors: Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li
Abstract summary: We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
Score: 26.14383240933706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores. Extensive experiments demonstrate that TrOMR outperforms current OMR methods, especially in real-world scenarios. We also develop a TrOMR system and build a camera scene dataset for full-page music scores in real-world. The code and datasets will be made available for reproducibility.

Related papers

Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities. The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z)
Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image. We introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z)
Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z)
End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition. Our system processes an entire music score page and outputs a complete transcription in a music encoding format. The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z)
Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription [13.960714900433269]
Sheet Music Transformer is the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies. Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively.
arXiv Detail & Related papers (2024-02-12T11:52:21Z)
A Unified Representation Framework for the Evaluation of Optical Music Recognition Systems [4.936226952764696]
We identify the need for a common music representation language and propose the Music Tree Notation (MTN) format. This format represents music as a set of primitives that group together into higher-abstraction nodes. We have also developed a specific set of OMR metrics and a typeset score dataset as a proof of concept of this idea.
arXiv Detail & Related papers (2023-12-20T10:45:22Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction. It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data. We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z)
Specificity-Preserving Federated Learning for MR Image Reconstruction [94.58912814426122]
Federated learning can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction. Recent FL techniques tend to solve this by enhancing the generalization of the global model. We propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI)
arXiv Detail & Related papers (2021-12-09T22:13:35Z)
DoReMi: First glance at a universal OMR dataset [0.0]
DoReMi is an OMR dataset that addresses the main challenges of OMR. It includes over 6400 printed sheet music images with accompanying metadata. We obtain 64% mean average precision (mAP) in object detection using half of the data.
arXiv Detail & Related papers (2021-07-16T09:24:58Z)
Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture. We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN) The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.