GAN-based Content-Conditioned Generation of Handwritten Musical Symbols
- URL: http://arxiv.org/abs/2510.17869v1
- Date: Thu, 16 Oct 2025 11:21:53 GMT
- Title: GAN-based Content-Conditioned Generation of Handwritten Musical Symbols
- Authors: Gerard Asbert, Pau Torras, Lei Kang, Alicia Fornés, Josep Lladós,
- Abstract summary: This study explores the generation of realistic, handwritten-looking scores by implementing a music symbol-level Generative Adversarial Network (GAN)<n>We have evaluated the visual fidelity of these generated samples, concluding that the generated symbols exhibit a high degree of realism.
- Score: 5.69735546372407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The field of Optical Music Recognition (OMR) is currently hindered by the scarcity of real annotated data, particularly when dealing with handwritten historical musical scores. In similar fields, such as Handwritten Text Recognition, it was proven that synthetic examples produced with image generation techniques could help to train better-performing recognition architectures. This study explores the generation of realistic, handwritten-looking scores by implementing a music symbol-level Generative Adversarial Network (GAN) and assembling its output into a full score using the Smashcima engraving software. We have systematically evaluated the visual fidelity of these generated samples, concluding that the generated symbols exhibit a high degree of realism, marking significant progress in synthetic score generation.
Related papers
- Decoding Musical Origins: Distinguishing Human and AI Composers [0.6246322794612152]
YNote is a novel, machine-learning-friendly music notation system.<n>We train an effective classification model capable of distinguishing whether a piece of music was composed by a human.<n>The model achieves an accuracy of 98.25%, successfully demonstrating that YNote retains sufficient stylistic information.
arXiv Detail & Related papers (2025-09-14T17:50:33Z) - Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z) - Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment [6.806050368211496]
We present Text2midi-InferAlign, a novel technique for improving symbolic music generation at inference time.<n>Our method leverages text-to-audio alignment and music structural alignment rewards during inference to encourage the generated music to be consistent with the input caption.
arXiv Detail & Related papers (2025-05-19T03:36:06Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
KITTEN is a benchmark for Knowledge-InTensive image generaTion on real-world ENtities.<n>We conduct a systematic study of the latest text-to-image models and retrieval-augmented models.<n>Analysis shows that even advanced text-to-image models fail to generate accurate visual details of entities.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI.
This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - One-shot Compositional Data Generation for Low Resource Handwritten Text
Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information.
In this paper we address this problem through a data generation technique based on Bayesian Program Learning.
Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Generative Hierarchical Features from Synthesizing Images [65.66756821069124]
We show that learning to synthesize images can bring remarkable hierarchical visual features that are generalizable across a wide range of applications.
The visual feature produced by our encoder, termed as Generative Hierarchical Feature (GH-Feat), has strong transferability to both generative and discriminative tasks.
arXiv Detail & Related papers (2020-07-20T18:04:14Z) - Embeddings as representation for symbolic music [0.0]
A representation technique that allows encoding music in a way that contains musical meaning would improve the results of any model trained for computer music tasks.
In this paper, we experiment with embeddings to represent musical notes from 3 different variations of a dataset and analyze if the model can capture useful musical patterns.
arXiv Detail & Related papers (2020-05-19T13:04:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.