PBSCSR: The Piano Bootleg Score Composer Style Recognition Dataset
- URL: http://arxiv.org/abs/2401.16803v2
- Date: Wed, 7 Feb 2024 06:48:12 GMT
- Title: PBSCSR: The Piano Bootleg Score Composer Style Recognition Dataset
- Authors: Arhan Jain, Alec Bunn, Austin Pham, and TJ Tsai
- Abstract summary: This article motivates, describes, and presents the PBSCSR dataset for studying composer style recognition of piano sheet music.
Our overarching goal was to create a dataset for studying composer style recognition that is "as accessible as MNIST and as challenging as ImageNet"
The dataset itself contains 40,000 62x64 bootleg score images for a 9-way classification task, 100,000 62x64 bootleg score images for a 100-way classification task, and 29,310 unlabeled variable-length bootleg score images for pretraining.
- Score: 5.314803183185992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article motivates, describes, and presents the PBSCSR dataset for
studying composer style recognition of piano sheet music. Our overarching goal
was to create a dataset for studying composer style recognition that is "as
accessible as MNIST and as challenging as ImageNet". To achieve this goal, we
use a previously proposed feature representation of sheet music called a
bootleg score, which encodes the position of noteheads relative to the staff
lines. Using this representation, we sample fixed-length bootleg score
fragments from piano sheet music images on IMSLP. The dataset itself contains
40,000 62x64 bootleg score images for a 9-way classification task, 100,000
62x64 bootleg score images for a 100-way classification task, and 29,310
unlabeled variable-length bootleg score images for pretraining. The labeled
data is presented in a form that mirrors MNIST images, in order to make it
extremely easy to visualize, manipulate, and train models in an efficient
manner. Additionally, we include relevant metadata to allow access to the
underlying raw sheet music images and other related data on IMSLP. We describe
several research tasks that could be studied with the dataset, including
variations of composer style recognition in a few-shot or zero-shot setting.
For tasks that have previously proposed models, we release code and baseline
results for future works to compare against. We also discuss open research
questions that the PBSCSR data is especially well suited to facilitate research
on and areas of fruitful exploration in future work.
Related papers
- Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Learning Meta-class Memory for Few-Shot Semantic Segmentation [90.28474742651422]
We introduce the concept of meta-class, which is the meta information shareable among all classes.
We propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings.
Our proposed MM-Net achieves 37.5% mIoU on the COCO dataset in 1-shot setting, which is 5.1% higher than the previous state-of-the-art.
arXiv Detail & Related papers (2021-08-06T06:29:59Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - Composer Style Classification of Piano Sheet Music Images Using Language
Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format.
Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation.
We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z) - Camera-Based Piano Sheet Music Identification [19.850248946069023]
We use all solo piano sheet music images in the entire IMSLP dataset as a searchable database.
We propose a novel hashing scheme called dynamic n-gram fingerprinting that significantly reduces runtime.
In experiments on IMSLP data, our proposed method achieves a mean reciprocal rank of 0.85 and an average runtime of 0.98 seconds per query.
arXiv Detail & Related papers (2020-07-29T03:55:27Z) - Learning to Read and Follow Music in Complete Score Sheet Images [8.680081568962997]
We propose the first system that directly performs score following in full-page, completely unprocessed sheet images.
Based on incoming audio and a given image of the score, our system directly predicts the most likely position within the page that matches the audio.
arXiv Detail & Related papers (2020-07-21T11:53:22Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.