Related papers: GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers

GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers

URL: http://arxiv.org/abs/2302.05393v1
Date: Fri, 10 Feb 2023 17:43:03 GMT
Title: GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with Transformers
Authors: Pedro Sarmento, Adarsh Kumar, Yu-Hua Chen, CJ Carr, Zack Zukowski, Mathieu Barthet
Abstract summary: We use the DadaGP dataset for guitar tab music generation, a corpus of over 26k songs in GuitarPro and token formats. We introduce methods to condition a Transformer-XL deep learning model to generate guitar tabs based on desired instrumentation and genre. Results indicate that the GTR-CTRL methods provide more flexibility and control for guitar-focused symbolic music generation than an unconditioned model.
Score: 14.025337055088102
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recently, symbolic music generation with deep learning techniques has witnessed steady improvements. Most works on this topic focus on MIDI representations, but less attention has been paid to symbolic music generation using guitar tablatures (tabs) which can be used to encode multiple instruments. Tabs include information on expressive techniques and fingerings for fretted string instruments in addition to rhythm and pitch. In this work, we use the DadaGP dataset for guitar tab music generation, a corpus of over 26k songs in GuitarPro and token formats. We introduce methods to condition a Transformer-XL deep learning model to generate guitar tabs (GTR-CTRL) based on desired instrumentation (inst-CTRL) and genre (genre-CTRL). Special control tokens are appended at the beginning of each song in the training corpus. We assess the performance of the model with and without conditioning. We propose instrument presence metrics to assess the inst-CTRL model's response to a given instrumentation prompt. We trained a BERT model for downstream genre classification and used it to assess the results obtained with the genre-CTRL model. Statistical analyses evidence significant differences between the conditioned and unconditioned models. Overall, results indicate that the GTR-CTRL methods provide more flexibility and control for guitar-focused symbolic music generation than an unconditioned model.

Related papers

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation. It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately. To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z)
TapToTab : Video-Based Guitar Tabs Generation using AI and Audio Analysis [0.0]
This paper introduces an advanced approach leveraging deep learning, specifically YOLO models for real-time fretboard detection. Experimental results demonstrate substantial improvements in detection accuracy and robustness compared to traditional techniques. This paper aims to revolutionize guitar instruction by automating the creation of guitar tabs from video recordings.
arXiv Detail & Related papers (2024-09-13T08:17:15Z)
MIDI-to-Tab: Guitar Tablature Inference via Masked Language Modeling [6.150307957212576]
We introduce a novel deep learning solution to symbolic guitar tablature estimation. We train an encoder-decoder Transformer model in a masked language modeling paradigm to assign notes to strings. The model is first pre-trained on DadaGP, a dataset of over 25K tablatures, and then fine-tuned on a curated set of professionally transcribed guitar performances.
arXiv Detail & Related papers (2024-08-09T12:25:23Z)
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting [9.812666469580872]
We propose an expressive acoustic guitar sound synthesis model with a customized input representation to the instrument. We implement the proposed approach using diffusion-based outpainting which can generate audio with long-term consistency. Our proposed model has higher audio quality than the baseline model and generates more realistic timbre sounds.
arXiv Detail & Related papers (2024-01-24T14:44:01Z)
Modeling Bends in Popular Music Guitar Tablatures [49.64902130083662]
Tablature notation is widely used in popular music to transcribe and share guitar musical content. This paper focuses on bends, which enable to progressively shift the pitch of a note, therefore circumventing physical limitations of the discrete fretted fingerboard. Experiments are performed on a corpus of 932 lead guitar tablatures of popular music and show that a decision tree successfully predicts bend occurrences with an F1 score of 0.71 anda limited amount of false positive predictions.
arXiv Detail & Related papers (2023-08-22T07:50:58Z)
ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production [0.0]
We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a custom dataset of 173 progressive metal songs. Our model is able to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We demonstrate the value of the model by using it as a tool to create a progressive metal song, fully produced and mixed by a human metal producer.
arXiv Detail & Related papers (2023-07-11T15:19:47Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
DadaGP: A Dataset of Tokenized GuitarPro Songs for Sequence Models [25.15855175804765]
DadaGP is a new symbolic music dataset comprising 26,181 song scores in the GuitarPro format covering 739 musical genres. DadaGP is released with an encoder/decoder which converts GuitarPro files to tokens and back. We present results of a use case in which DadaGP is used to train a Transformer-based model to generate new songs in GuitarPro format.
arXiv Detail & Related papers (2021-07-30T14:21:36Z)
Codified audio language modeling learns useful representations for music information retrieval [77.63657430536593]
We show that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks. To determine if Jukebox's representations contain useful information for MIR, we use them as input features to train shallow models on several MIR tasks. We observe that representations from Jukebox are considerably stronger than those from models pre-trained on tagging, suggesting that pre-training via codified audio language modeling may address blind spots in conventional approaches.
arXiv Detail & Related papers (2021-07-12T18:28:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.