Camera-Based Piano Sheet Music Identification
- URL: http://arxiv.org/abs/2007.14579v1
- Date: Wed, 29 Jul 2020 03:55:27 GMT
- Title: Camera-Based Piano Sheet Music Identification
- Authors: Daniel Yang and TJ Tsai
- Abstract summary: We use all solo piano sheet music images in the entire IMSLP dataset as a searchable database.
We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime.
In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query.
- Score: 19.850248946069023
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a method for large-scale retrieval of piano sheet music
images. Our work differs from previous studies on sheet music retrieval in two
ways. First, we investigate the problem at a much larger scale than previous
studies, using all solo piano sheet music images in the entire IMSLP dataset as
a searchable database. Second, we use cell phone images of sheet music as our
input queries, which lends itself to a practical, user-facing application. We
show that a previously proposed fingerprinting method for sheet music retrieval
is far too slow for a real-time application, and we diagnose its shortcomings.
We propose a novel hashing scheme called dynamic n-gram fingerprinting that
significantly reduces runtime while simultaneously boosting retrieval accuracy.
In experiments on IMSLP data, our proposed method achieves a mean reciprocal
rank of 0.85 and an average runtime of 0.98 seconds per query.
Related papers
- CFIR: Fast and Effective Long-Text To Image Retrieval for Large Corpora [3.166549403591528]
This paper presents a two-stage Coarse-to-Fine Index-shared Retrieval (CFIR) framework, designed for fast and effective long-text to image retrieval.
CFIR surpasses existing MLLMs by up to 11.06% in Recall@1000, while reducing training and retrieval times by 68.75% and 99.79%, respectively.
arXiv Detail & Related papers (2024-02-23T11:47:16Z) - PBSCR: The Piano Bootleg Score Composer Recognition Dataset [5.314803183185992]
PBSCR is a dataset for studying composer recognition of classical piano music.
It contains 40,000 62x64 bootleg score images for a 9-class recognition task, 100,000 62x64 bootleg score images for a 100-class recognition task, and 29,310 unlabeled variable-length bootleg score images for pretraining.
arXiv Detail & Related papers (2024-01-30T07:50:32Z) - Passage Summarization with Recurrent Models for Audio-Sheet Music
Retrieval [4.722882736419499]
Cross-modal music retrieval can connect sheet music images to audio recordings.
We propose a cross-modal recurrent network that learns joint embeddings to summarize longer passages of corresponding audio and sheet music.
We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.
arXiv Detail & Related papers (2023-09-21T14:30:02Z) - MaskSearch: Querying Image Masks at Scale [60.82746984506577]
MaskSearch is a system that focuses on accelerating queries over databases of image masks while guaranteeing the correctness of query results.
Experiments with our prototype show that MaskSearch, using indexes approximately 5% of the compressed data size, accelerates individual queries by up to two orders of magnitude.
arXiv Detail & Related papers (2023-05-03T18:28:14Z) - Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening [53.1711708318581]
Current image-text retrieval methods suffer from $N$-related time complexity.
This paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval.
arXiv Detail & Related papers (2023-03-14T09:36:42Z) - DSI++: Updating Transformer Memory with New Documents [95.70264288158766]
We introduce DSI++, a continual learning challenge for DSI to incrementally index new documents.
We show that continual indexing of new documents leads to considerable forgetting of previously indexed documents.
We introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task.
arXiv Detail & Related papers (2022-12-19T18:59:34Z) - ALADIN: Distilling Fine-grained Alignment Scores for Efficient
Image-Text Matching and Retrieval [51.588385824875886]
Cross-modal retrieval consists in finding images related to a given query text or vice-versa.
Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks.
This paper proposes an ALign And DIstill Network (ALADIN) to fill in the gap between effectiveness and efficiency.
arXiv Detail & Related papers (2022-07-29T16:01:48Z) - HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text
Retrieval [85.28292877465353]
This paper proposes a textbfHierarchical textbfVision-textbfLanguage textbfPre-Training for fast Image-Text Retrieval (ITR)
Specifically, we design a novel hierarchical retrieval objective, which uses the representation of different dimensions for coarse-to-fine ITR.
arXiv Detail & Related papers (2022-05-24T14:32:57Z) - Composer Style Classification of Piano Sheet Music Images Using Language
Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format.
Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation.
We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z) - Learning to Read and Follow Music in Complete Score Sheet Images [8.680081568962997]
We propose the first system that directly performs score following in full-page, completely unprocessed sheet images.
Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio.
arXiv Detail & Related papers (2020-07-21T11:53:22Z) - Best-First Beam Search [78.71330480725668]
We show that the standard implementation of beam search can be made up to 10x faster in practice.
We propose a memory-reduced variant of Best-First Beam Search, which has a similar beneficial search bias in terms of downstream performance.
arXiv Detail & Related papers (2020-07-08T05:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.