Related papers: Camera-Based Piano Sheet Music Identification

Camera-Based Piano Sheet Music Identification

URL: http://arxiv.org/abs/2007.14579v1
Date: Wed, 29 Jul 2020 03:55:27 GMT
Title: Camera-Based Piano Sheet Music Identification
Authors: Daniel Yang and TJ Tsai
Abstract summary: We use all solo piano sheet music images in the entire IMSLP dataset as a searchable database. We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime. In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query.
Score: 19.850248946069023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a method for large-scale retrieval of piano sheet music images. Our work differs from previous studies on sheet music retrieval in two ways. First, we investigate the problem at a much larger scale than previous studies, using all solo piano sheet music images in the entire IMSLP dataset as a searchable database. Second, we use cell phone images of sheet music as our input queries, which lends itself to a practical, user-facing application. We show that a previously proposed fingerprinting method for sheet music retrieval is far too slow for a real-time application, and we diagnose its shortcomings. We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime while simultaneously boosting retrieval accuracy. In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query.

Related papers

Refining music sample identification with a self-supervised graph neural network [16.73613870989583]
We propose a lightweight and scalable encoding architecture employing a Graph Neural Network within a contrastive learning framework.<n>Our model uses only 9% of the trainable parameters compared to the current state-of-the-art system while achieving comparable performance, reaching a mean average precision (mAP) of 44.2%.<n>In addition, because queries in real-world applications are often short in duration, we benchmark our system for short queries using new fine-grained annotations for the Sample100 dataset.
arXiv Detail & Related papers (2025-06-17T16:19:21Z)
CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora [3.166549403591528]
This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective long-text to image retrieval. CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively.
arXiv Detail & Related papers (2024-02-23T11:47:16Z)
PBSCR: The Piano Bootleg Score Composer Recognition Dataset [5.314803183185992]
PBSCR is a dataset for studying composer recognition of classical piano music. It contains 40,000 62x64 bootleg score images for a 9-class recognition task, 100,000 62x64 bootleg score images for a 100-class recognition task, and 29,310 unlabeled variable-length bootleg score images for pretraining.
arXiv Detail & Related papers (2024-01-30T07:50:32Z)
Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval [4.722882736419499]
Cross-modal music retrieval can connect sheet music images to audio recordings. We propose a cross-modal recurrent network that learns joint embeddings to summarize longer passages of corresponding audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.
arXiv Detail & Related papers (2023-09-21T14:30:02Z)
MaskSearch: Querying Image Masks at Scale [60.82746984506577]
MaskSearch is a system that focuses on accelerating queries over databases of image masks while guaranteeing the correctness of query results. Experiments with our prototype show that MaskSearch, using indexes approximately 5% of the compressed data size, accelerates individual queries by up to two orders of magnitude.
arXiv Detail & Related papers (2023-05-03T18:28:14Z)
Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening [53.1711708318581]
Current image-text retrieval methods suffer from $N$-related time complexity. This paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval.
arXiv Detail & Related papers (2023-03-14T09:36:42Z)
DSI++: Updating Transformer Memory with New Documents [95.70264288158766]
We introduce DSI++, a continual learning challenge for DSI to incrementally index new documents. We show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task.
arXiv Detail & Related papers (2022-12-19T18:59:34Z)
ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval [51.588385824875886]
Cross-modal retrieval consists in finding images related to a given query text or vice-versa. Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks. This paper proposes an ALign And DIstill Network (ALADIN) to fill in the gap between effectiveness and efficiency.
arXiv Detail & Related papers (2022-07-29T16:01:48Z)
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval [85.28292877465353]
This paper proposes a textbfHierarchical textbfVision-textbfLanguage textbfPre-Training for fast Image-Text Retrieval (ITR) Specifically, we design a novel hierarchical retrieval objective, which uses the representation of different dimensions for coarse-to-fine ITR.
arXiv Detail & Related papers (2022-05-24T14:32:57Z)
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z)
Learning to Read and Follow Music in Complete Score Sheet Images [8.680081568962997]
We propose the first system that directly performs score following in full-page, completely unprocessed sheet images. Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio.
arXiv Detail & Related papers (2020-07-21T11:53:22Z)
Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice. We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.