Learning to Read and Follow Music in Complete Score Sheet Images
- URL: http://arxiv.org/abs/2007.10736v1
- Date: Tue, 21 Jul 2020 11:53:22 GMT
- Title: Learning to Read and Follow Music in Complete Score Sheet Images
- Authors: Florian Henkel, Rainer Kelz, Gerhard Widmer
- Abstract summary: We propose the first system that directly performs score following in full-page, completely unprocessed sheet images.
Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio.
- Score: 8.680081568962997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the task of score following in sheet music given as
unprocessed images. While existing work either relies on OMR software to obtain
a computer-readable score representation, or crucially relies on prepared sheet
image excerpts, we propose the first system that directly performs score
following in full-page, completely unprocessed sheet images. Based on incoming
audio and a given image of the score, our system directly predicts the most
likely position within the page that matches the audio, outperforming current
state-of-the-art image-based score followers in terms of alignment precision.
We also compare our method to an OMR-based approach and empirically show that
it can be a viable alternative to such a system.
Related papers
- Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music [12.779526750915707]
We present the first truly end-to-end approach for page-level Optical Music Recognition.
Our system processes an entire music score page and outputs a complete transcription in a music encoding format.
The results demonstrate that our system not only successfully transcribes full-page music scores but also outperforms the commercial tool in both zero-shot settings and after fine-tuning with the target domain.
arXiv Detail & Related papers (2024-05-20T15:21:48Z) - ALADIN: Distilling Fine-grained Alignment Scores for Efficient
Image-Text Matching and Retrieval [51.588385824875886]
Cross-modal retrieval consists in finding images related to a given query text or vice-versa.
Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks.
This paper proposes an ALign And DIstill Network (ALADIN) to fill in the gap between effectiveness and efficiency.
arXiv Detail & Related papers (2022-07-29T16:01:48Z) - Self-Supervised Predictive Learning: A Negative-Free Method for Sound
Source Localization in Visual Scenes [91.59435809457659]
Self-Supervised Predictive Learning (SSPL) is a negative-free method for sound localization via explicit positive mining.
SSPL achieves significant improvements of 8.6% cIoU and 3.4% AUC on SoundNet-Flickr compared to the previous best.
arXiv Detail & Related papers (2022-03-25T01:42:42Z) - Fully Automatic Page Turning on Real Scores [6.230751621285321]
We present a prototype of an automatic page turning system that works directly on real scores, i.e., sheet images.
Our system is based on a multi-modal neural architecture that observes a complete sheet image page as input, listens to an incoming musical performance, and predicts the position in the image.
As a proof of concept we further combine our system with an actual machine that will physically turn the page on command.
arXiv Detail & Related papers (2021-11-12T10:23:14Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Multi-modal Conditional Bounding Box Regression for Music Score
Following [7.360807642941713]
This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following.
A conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance.
arXiv Detail & Related papers (2021-05-10T12:43:35Z) - SeqNet: Learning Descriptors for Sequence-based Hierarchical Place
Recognition [31.714928102950594]
We present a novel hybrid system that creates a high performance initial match hypothesis generator.
Sequence descriptors are generated using a temporal convolutional network dubbed SeqNet.
We then perform selective sequential score aggregation using shortlisted single image learnt descriptors to produce an overall place match hypothesis.
arXiv Detail & Related papers (2021-02-23T10:32:10Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Camera-Based Piano Sheet Music Identification [19.850248946069023]
We use all solo piano sheet music images in the entire IMSLP dataset as a searchable database.
We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime.
In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query.
arXiv Detail & Related papers (2020-07-29T03:55:27Z) - Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms.
Our pipeline's modular structure allows easy integration, configuration, and combination of different methods.
We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.