TrOMR:Transformer-Based Polyphonic Optical Music Recognition
- URL: http://arxiv.org/abs/2308.09370v1
- Date: Fri, 18 Aug 2023 08:06:27 GMT
- Title: TrOMR:Transformer-Based Polyphonic Optical Music Recognition
- Authors: Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li
- Abstract summary: We propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR.
We also introduce a novel consistency loss function and a reasonable approach for data annotation to improve recognition accuracy for complex music scores.
- Score: 26.14383240933706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Music Recognition (OMR) is an important technology in music and has
been researched for a long time. Previous approaches for OMR are usually based
on CNN for image understanding and RNN for music symbol classification. In this
paper, we propose a transformer-based approach with excellent global perceptual
capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a
novel consistency loss function and a reasonable approach for data annotation
to improve recognition accuracy for complex music scores. Extensive experiments
demonstrate that TrOMR outperforms current OMR methods, especially in
real-world scenarios. We also develop a TrOMR system and build a camera scene
dataset for full-page music scores in real-world. The code and datasets will be
made available for reproducibility.
Related papers
- Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription [13.960714900433269]
Sheet Music Transformer is the first end-to-end OMR model designed to transcribe complex musical scores without relying solely on monophonic strategies.
Our model has been tested on two polyphonic music datasets and has proven capable of handling these intricate music structures effectively.
arXiv Detail & Related papers (2024-02-12T11:52:21Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
Synthesis [61.07542274267568]
We study a new task -- real-world audio-visual scene synthesis -- and a first-of-its-kind NeRF-based approach for multimodal learning.
We propose an acoustic-aware audio generation module that integrates prior knowledge of audio propagation into NeRF.
We present a coordinate transformation module that expresses a view direction relative to the sound source, enabling the model to learn sound source-centric acoustic fields.
arXiv Detail & Related papers (2023-02-04T04:17:19Z) - ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction.
It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data.
We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z) - Specificity-Preserving Federated Learning for MR Image Reconstruction [94.58912814426122]
Federated learning can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction.
Recent FL techniques tend to solve this by enhancing the generalization of the global model.
We propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI)
arXiv Detail & Related papers (2021-12-09T22:13:35Z) - An Empirical Evaluation of End-to-End Polyphonic Optical Music
Recognition [24.377724078096144]
Piano and orchestral scores frequently exhibit polyphonic passages, which add a second dimension to the task.
We propose two novel formulations for end-to-end polyphonic OMR.
We observe a new state-of-the-art performance with our multi-sequence detection decoder, RNNDecoder.
arXiv Detail & Related papers (2021-08-03T22:04:40Z) - DoReMi: First glance at a universal OMR dataset [0.0]
DoReMi is an OMR dataset that addresses the main challenges of OMR.
It includes over 6400 printed sheet music images with accompanying metadata.
We obtain 64% mean average precision (mAP) in object detection using half of the data.
arXiv Detail & Related papers (2021-07-16T09:24:58Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Residual Recurrent CRNN for End-to-End Optical Music Recognition on
Monophonic Scores [8.829800916216275]
We propose an innovative framework that combines a block of Residual Recurrent Convolutional Neural Network with a recurrent-Decoder network.
The experiment results are benchmarked against a publicly available dataset called CAMERA-PRIMUS.
arXiv Detail & Related papers (2020-10-26T08:39:37Z) - Optical Music Recognition: State of the Art and Major Challenges [0.0]
Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format.
The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet.
Recently, there has been a shift in OMR from using conventional computer vision techniques towards a deep learning approach.
arXiv Detail & Related papers (2020-06-14T12:40:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.