MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer
- URL: http://arxiv.org/abs/2008.06048v2
- Date: Thu, 20 Aug 2020 19:13:39 GMT
- Title: MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer
- Authors: Jeff Ens, Philippe Pasquier
- Abstract summary: We propose a generative system based on the Transformer architecture that is capable of generating multi-track music.
We create a time-ordered sequence of musical events for each track and several tracks into a single sequence.
This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
- Score: 9.569049935824227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the Multi-Track Music Machine (MMM), a generative system based on
the Transformer architecture that is capable of generating multi-track music.
In contrast to previous work, which represents musical material as a single
time-ordered sequence, where the musical events corresponding to different
tracks are interleaved, we create a time-ordered sequence of musical events for
each track and concatenate several tracks into a single sequence. This takes
advantage of the Transformer's attention-mechanism, which can adeptly handle
long-term dependencies. We explore how various representations can offer the
user a high degree of control at generation time, providing an interactive demo
that accommodates track-level and bar-level inpainting, and offers control over
track instrumentation and note density.
Related papers
- SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation.
It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately.
To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z) - MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition [4.152843247686306]
MIDI-GPT is a generative system designed for computer-assisted music composition.
It supports the infilling of musical material at the track and bar level, and can condition generation on attributes including instrument type, musical style, note density, polyphony level, and note duration.
We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material.
arXiv Detail & Related papers (2025-01-28T15:17:36Z) - UniMuMo: Unified Text, Music and Motion Generation [57.72514622935806]
We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.
By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture.
arXiv Detail & Related papers (2024-10-06T16:04:05Z) - BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features.
The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed.
The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Anticipatory Music Transformer [60.15347393822849]
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process.
We focus on infilling control tasks, whereby the controls are a subset of the events themselves.
We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset.
arXiv Detail & Related papers (2023-06-14T16:27:53Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - MusIAC: An extensible generative framework for Music Infilling
Applications with multi-level Control [11.811562596386253]
Infilling refers to the task of generating musical sections given the surrounding multi-track music.
The proposed framework is for new control tokens as the added control tokens such as tonal tension per bar and track polyphony level.
We present the model in a Google Colab notebook to enable interactive generation.
arXiv Detail & Related papers (2022-02-11T10:02:21Z) - MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just
One Transformer VAE [36.9033909878202]
Transformer and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation.
In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths.
Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.
arXiv Detail & Related papers (2021-05-10T03:44:03Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.