Related papers: Joint Transcription of Acoustic Guitar Strumming Directions and Chords

Joint Transcription of Acoustic Guitar Strumming Directions and Chords

URL: http://arxiv.org/abs/2508.07973v1
Date: Mon, 11 Aug 2025 13:34:49 GMT
Title: Joint Transcription of Acoustic Guitar Strumming Directions and Chords
Authors: Sebastian Murgul, Johannes Schimper, Michael Heizmann,
Abstract summary: We extend a multimodal approach to guitar strumming transcription by introducing a novel dataset and a deep learning-based transcription model.<n>We collect 90 min of real-world guitar recordings using an ESP32 smartwatch motion sensor and a structured recording protocol.<n>A Convolutional Recurrent Neural Network (CRNN) model is trained to detect strumming events, classify their direction, and identify the corresponding chords using only microphone audio.
Score: 2.5398014196797614
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic transcription of guitar strumming is an underrepresented and challenging task in Music Information Retrieval (MIR), particularly for extracting both strumming directions and chord progressions from audio signals. While existing methods show promise, their effectiveness is often hindered by limited datasets. In this work, we extend a multimodal approach to guitar strumming transcription by introducing a novel dataset and a deep learning-based transcription model. We collect 90 min of real-world guitar recordings using an ESP32 smartwatch motion sensor and a structured recording protocol, complemented by a synthetic dataset of 4h of labeled strumming audio. A Convolutional Recurrent Neural Network (CRNN) model is trained to detect strumming events, classify their direction, and identify the corresponding chords using only microphone audio. Our evaluation demonstrates significant improvements over baseline onset detection algorithms, with a hybrid method combining synthetic and real-world data achieving the highest accuracy for both strumming action detection and chord classification. These results highlight the potential of deep learning for robust guitar strumming transcription and open new avenues for automatic rhythm guitar analysis.

Related papers

Representation-Regularized Convolutional Audio Transformer for Audio Understanding [53.092757178419355]
bootstrapping representations from scratch is computationally expensive, often requiring extensive training to converge.<n>We propose the Convolutional Audio Transformer (CAT), a unified framework designed to address these challenges.
arXiv Detail & Related papers (2026-01-29T12:16:19Z)
GuitarFlow: Realistic Electric Guitar Synthesis From Tablatures via Flow Matching and Style Transfer [7.72498447842112]
We introduce GuitarFlow, a model designed specifically for electric guitar synthesis.<n>The generative process is guided using tablatures, an ubiquitous and intuitive guitar-specific symbolic format.<n>We show significant improvement in the realism of the generated guitar audio from tablatures.
arXiv Detail & Related papers (2025-10-23T13:31:41Z)
Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music [46.69593319852797]
We transcribe the rhythmic patterns in 410 popular songs and record cover versions where the guitar tracks followed those transcriptions.<n>We detect individual strums within the separated guitar audio, using a pre-trained foundation model (MERT) as a backbone.<n>We show that it is possible to transcribe the rhythmic patterns of the guitar track in polyphonic music with quite high accuracy.
arXiv Detail & Related papers (2025-10-07T10:22:31Z)
Exploring Procedural Data Generation for Automatic Acoustic Guitar Fingerpicking Transcription [2.8544822698499255]
This work investigates a procedural data generation pipeline as an alternative to real audio recordings for training transcription models.<n>Our approach synthesizes training data through four stages: knowledge-based fingerpicking tablature composition, MIDI performance rendering, physical modeling.<n>We train and evaluate a CRNN-based note-tracking model on both real and synthetic datasets, demonstrating that procedural data can be used to achieve reasonable note-tracking results.
arXiv Detail & Related papers (2025-08-11T13:52:17Z)
EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing [54.10773655199149]
We investigate leveraging cross-attention control for efficient audio editing within auto-regressive models.<n>Inspired by image editing methodologies, we develop a Prompt-to-Prompt-like approach that guides edits through cross and self-attention mechanisms.
arXiv Detail & Related papers (2025-07-15T08:44:11Z)
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation.<n>It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately.<n>To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z)
TapToTab : Video-Based Guitar Tabs Generation using AI and Audio Analysis [0.0]
This paper introduces an advanced approach leveraging deep learning, specifically YOLO models for real-time fretboard detection. Experimental results demonstrate substantial improvements in detection accuracy and robustness compared to traditional techniques. This paper aims to revolutionize guitar instruction by automating the creation of guitar tabs from video recordings.
arXiv Detail & Related papers (2024-09-13T08:17:15Z)
Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar [2.5291326778025143]
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and variational autoencoders (VAEs)
arXiv Detail & Related papers (2023-07-13T10:48:29Z)
Transfer of knowledge among instruments in automatic music transcription [2.0305676256390934]
This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model. It is a good base for further transfer learning to quickly adapt transcription model for other instruments.
arXiv Detail & Related papers (2023-04-30T08:37:41Z)
Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio. We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z)
Decision Forest Based EMG Signal Classification with Low Volume Dataset Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience. We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z)
Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances. We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.