Related papers: VisionScores -- A system-segmented image score dataset for deep learning tasks

VisionScores -- A system-segmented image score dataset for deep learning tasks

URL: http://arxiv.org/abs/2506.23030v1
Date: Sat, 28 Jun 2025 22:29:23 GMT
Title: VisionScores -- A system-segmented image score dataset for deep learning tasks
Authors: Alejandro Romero Amezcua, Mariano José Juan Rivera Meraz,
Abstract summary: VisionScores is the first system-segmented image score dataset.<n>It aims to offer structure-rich, high information-density images for machine and deep learning tasks.
Score: 49.1574468325115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: VisionScores presents a novel proposal being the first system-segmented image score dataset, aiming to offer structure-rich, high information-density images for machine and deep learning tasks. Delimited to two-handed piano pieces, it was built to consider not only certain graphic similarity but also composition patterns, as this creative process is highly instrument-dependent. It provides two scenarios in relation to composer and composition type. The first, formed by 14k samples, considers works from different authors but the same composition type, specifically, Sonatinas. The latter, consisting of 10.8K samples, presents the opposite case, various composition types from the same author, being the one selected Franz Liszt. All of the 24.8k samples are formatted as grayscale jpg images of $128 \times 512$ pixels. VisionScores supplies the users not only the formatted samples but the systems' order and pieces' metadata. Moreover, unsegmented full-page scores and the pre-formatted images are included for further analysis.

Related papers

Visual Motif Identification: Elaboration of a Curated Comparative Dataset and Classification Methods [4.431754853927668]
In cinema, visual motifs are recurrent iconographic compositions that carry artistic or aesthetic significance. Our goal is to recognise and classify these motifs by proposing a new machine learning model that uses a custom dataset to that end. We show how features extracted from a CLIP model can be leveraged by using a shallow network and an appropriate loss to classify images into 20 different motifs, with surprisingly good results.
arXiv Detail & Related papers (2024-10-21T10:50:00Z)
PBSCR: The Piano Bootleg Score Composer Recognition Dataset [5.314803183185992]
PBSCR is a dataset for studying composer recognition of classical piano music. It contains 40,000 62x64 bootleg score images for a 9-class recognition task, 100,000 62x64 bootleg score images for a 100-class recognition task, and 29,310 unlabeled variable-length bootleg score images for pretraining.
arXiv Detail & Related papers (2024-01-30T07:50:32Z)
NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images [64.92809155168595]
This paper introduces a Multi-category Object Counting task to estimate the numbers of different objects in an aerial image. Considering the absence of a dataset for this task, a large-scale dataset is collected, consisting of 3,416 scenes with a resolution of 1024 $times$ 1024 pixels. The paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR.
arXiv Detail & Related papers (2024-01-19T07:12:36Z)
SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image. Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z)
Composer: Creative and Controllable Image Synthesis with Composable Conditions [57.78533372393828]
Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.
arXiv Detail & Related papers (2023-02-20T05:48:41Z)
SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels. The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level. We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z)
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [95.02406834386814]
Parti treats text-to-image generation as a sequence-to-sequence modeling problem. Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens. PartiPrompts (P2) is a new holistic benchmark of over 1600 English prompts.
arXiv Detail & Related papers (2022-06-22T01:11:29Z)
Deep Learning Based Automated COVID-19 Classification from Computed Tomography Images [0.0]
The paper presents a Convolutional Neural Networks (CNN) model for image classification, aiming at increasing predictive performance for COVID-19 diagnosis. This work proposes a less complex solution based on simply classifying 2D CT-Scan slices of images using their pixels via a 2D CNN model. Despite the simplicity in architecture, the proposed model showed improved quantitative results exceeding state-of-the-art on the same dataset of images.
arXiv Detail & Related papers (2021-11-22T13:35:10Z)
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.