Exploring Procedural Data Generation for Automatic Acoustic Guitar Fingerpicking Transcription
- URL: http://arxiv.org/abs/2508.07987v1
- Date: Mon, 11 Aug 2025 13:52:17 GMT
- Title: Exploring Procedural Data Generation for Automatic Acoustic Guitar Fingerpicking Transcription
- Authors: Sebastian Murgul, Michael Heizmann,
- Abstract summary: This work investigates a procedural data generation pipeline as an alternative to real audio recordings for training transcription models.<n>Our approach synthesizes training data through four stages: knowledge-based fingerpicking tablature composition, MIDI performance rendering, physical modeling.<n>We train and evaluate a CRNN-based note-tracking model on both real and synthetic datasets, demonstrating that procedural data can be used to achieve reasonable note-tracking results.
- Score: 2.8544822698499255
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Automatic transcription of acoustic guitar fingerpicking performances remains a challenging task due to the scarcity of labeled training data and legal constraints connected with musical recordings. This work investigates a procedural data generation pipeline as an alternative to real audio recordings for training transcription models. Our approach synthesizes training data through four stages: knowledge-based fingerpicking tablature composition, MIDI performance rendering, physical modeling using an extended Karplus-Strong algorithm, and audio augmentation including reverb and distortion. We train and evaluate a CRNN-based note-tracking model on both real and synthetic datasets, demonstrating that procedural data can be used to achieve reasonable note-tracking results. Finetuning with a small amount of real data further enhances transcription accuracy, improving over models trained exclusively on real recordings. These results highlight the potential of procedurally generated audio for data-scarce music information retrieval tasks.
Related papers
- Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control [66.46754271097555]
We release a fully open-source system for long-form song generation with fine-grained style conditioning.<n>The dataset consists of 116k fully licensed synthetic songs with automatically generated lyrics and style descriptions.<n>We train Muse via single-stage supervised finetuning of a Qwen-based language model extended with discrete audio tokens.
arXiv Detail & Related papers (2026-01-07T14:40:48Z) - Joint Transcription of Acoustic Guitar Strumming Directions and Chords [2.5398014196797614]
We extend a multimodal approach to guitar strumming transcription by introducing a novel dataset and a deep learning-based transcription model.<n>We collect 90 min of real-world guitar recordings using an ESP32 smartwatch motion sensor and a structured recording protocol.<n>A Convolutional Recurrent Neural Network (CRNN) model is trained to detect strumming events, classify their direction, and identify the corresponding chords using only microphone audio.
arXiv Detail & Related papers (2025-08-11T13:34:49Z) - Music Boomerang: Reusing Diffusion Models for Data Augmentation and Audio Manipulation [49.062766449989525]
Generative models of music audio are typically used to generate output based solely on a text prompt or melody.<n>Boomerang sampling, recently proposed for the image domain, allows generating output close to an existing example, using any pretrained diffusion model.
arXiv Detail & Related papers (2025-07-07T10:46:07Z) - Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription [2.3249139042158853]
The Fretting-Transformer is an encoderdecoder model that utilizes a T5 transformer architecture to automate the transcription of MIDI sequences into guitar tablature.<n>By framing the task as a symbolic translation problem, the model addresses key challenges, including string-fret ambiguity and physical playability.
arXiv Detail & Related papers (2025-06-17T06:25:35Z) - Naturalistic Music Decoding from EEG Data via Latent Diffusion Models [14.882764251306094]
This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data.<n>We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics.
arXiv Detail & Related papers (2024-05-15T03:26:01Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Annotation-free Automatic Music Transcription with Scalable Synthetic Data and Adversarial Domain Confusion [0.0]
We propose a transcription model that does not require any MIDI-audio paired data for pre-training and adversarial domain confusion.
In experiments, we evaluate methods under the real-world application scenario where training datasets do not include the MIDI annotation of audio.
Our proposed method achieved competitive performance relative to established baseline methods, despite not utilizing any real datasets of paired MIDI-audio.
arXiv Detail & Related papers (2023-12-16T10:07:18Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Transfer of knowledge among instruments in automatic music transcription [2.0305676256390934]
This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model.
It is a good base for further transfer learning to quickly adapt transcription model for other instruments.
arXiv Detail & Related papers (2023-04-30T08:37:41Z) - Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.