MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
- URL: http://arxiv.org/abs/2501.17011v2
- Date: Tue, 04 Feb 2025 16:14:38 GMT
- Title: MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
- Authors: Philippe Pasquier, Jeff Ens, Nathan Fradet, Paul Triana, Davide Rizzotti, Jean-Baptiste Rolland, Maryam Safi,
- Abstract summary: MIDI-GPT is a generative system designed for computer-assisted music composition.
It supports the infilling of musical material at the track and bar level, and can condition generation on attributes including instrument type, musical style, note density, polyphony level, and note duration.
We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material.
- Score: 4.152843247686306
- License:
- Abstract: We present and release MIDI-GPT, a generative system based on the Transformer architecture that is designed for computer-assisted music composition workflows. MIDI-GPT supports the infilling of musical material at the track and bar level, and can condition generation on attributes including: instrument type, musical style, note density, polyphony level, and note duration. In order to integrate these features, we employ an alternative representation for musical material, creating a time-ordered sequence of musical events for each track and concatenating several tracks into a single sequence, rather than using a single time-ordered sequence where the musical events corresponding to different tracks are interleaved. We also propose a variation of our representation allowing for expressiveness. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material. We also outline several real-world applications of MIDI-GPT, including collaborations with industry partners that explore the integration and evaluation of MIDI-GPT into commercial products, as well as several artistic works produced using it.
Related papers
- MidiTok Visualizer: a tool for visualization and analysis of tokenized MIDI symbolic music [0.0]
MidiTok Visualizer is a web application designed to facilitate the exploration and visualization of various MIDI tokenization methods from the MidiTok Python package.
arXiv Detail & Related papers (2024-10-27T17:00:55Z) - Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement [0.0]
CoSaRef is a MIDI-to-audio synthesis method that can be developed without MIDI-audio paired datasets.
It first performs concatenative synthesis based on MIDI inputs and then refines the resulting audio into realistic tracks using a diffusion-based deep generative model trained on audio-only datasets.
arXiv Detail & Related papers (2024-10-22T08:01:40Z) - Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices.
We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - MusIAC: An extensible generative framework for Music Infilling
Applications with multi-level Control [11.811562596386253]
Infilling refers to the task of generating musical sections given the surrounding multi-track music.
The proposed framework is for new control tokens as the added control tokens such as tonal tension per bar and track polyphony level.
We present the model in a Google Colab notebook to enable interactive generation.
arXiv Detail & Related papers (2022-02-11T10:02:21Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer [9.569049935824227]
We propose a generative system based on the Transformer architecture that is capable of generating multi-track music.
We create a time-ordered sequence of musical events for each track and several tracks into a single sequence.
This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
arXiv Detail & Related papers (2020-08-13T02:36:34Z) - Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.
We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.