PopMAG: Pop Music Accompaniment Generation
- URL: http://arxiv.org/abs/2008.07703v1
- Date: Tue, 18 Aug 2020 02:28:36 GMT
- Title: PopMAG: Pop Music Accompaniment Generation
- Authors: Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
- Abstract summary: We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
- Score: 190.09996798215738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In pop music, accompaniments are usually played by multiple instruments
(tracks) such as drum, bass, string and guitar, and can make a song more
expressive and contagious by arranging together with its melody. Previous works
usually generate multiple tracks separately and the music notes from different
tracks not explicitly depend on each other, which hurts the harmony modeling.
To improve harmony, in this paper, we propose a novel MUlti-track MIDI
representation (MuMIDI), which enables simultaneous multi-track generation in a
single sequence and explicitly models the dependency of the notes from
different tracks. While this greatly improves harmony, unfortunately, it
enlarges the sequence length and brings the new challenge of long-term music
modeling. We further introduce two new techniques to address this challenge: 1)
We model multiple note attributes (e.g., pitch, duration, velocity) of a
musical note in one step instead of multiple steps, which can shorten the
length of a MuMIDI sequence. 2) We introduce extra long-context as memory to
capture long-term dependency in music. We call our system for pop music
accompaniment generation as PopMAG. We evaluate PopMAG on multiple datasets
(LMD, FreeMidi and CPMD, a private dataset of Chinese pop songs) with both
subjective and objective metrics. The results demonstrate the effectiveness of
PopMAG for multi-track harmony modeling and long-term context modeling.
Specifically, PopMAG wins 42\%/38\%/40\% votes when comparing with ground truth
musical pieces on LMD, FreeMidi and CPMD datasets respectively and largely
outperforms other state-of-the-art music accompaniment generation models and
multi-track MIDI representations in terms of subjective and objective metrics.
Related papers
- UniMuMo: Unified Text, Music and Motion Generation [57.72514622935806]
We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.
By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture.
arXiv Detail & Related papers (2024-10-06T16:04:05Z) - Melody Is All You Need For Music Generation [10.366088659024685]
We present the Melody Guided Music Generation (MMGen) model, the first novel approach using melody to guide the music generation.
Specifically, we first align the melody with audio waveforms and their associated descriptions using the multimodal alignment module.
This allows MMGen to generate music that matches the style of the provided audio while also producing music that reflects the content of the given text description.
arXiv Detail & Related papers (2024-09-30T11:13:35Z) - Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices.
We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation [39.892059799407434]
MelodyGLM is a multi-task pre-training framework for generating melodies with long-term structure.
We have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces.
arXiv Detail & Related papers (2023-09-19T16:34:24Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer [9.569049935824227]
We propose a generative system based on the Transformer architecture that is capable of generating multi-track music.
We create a time-ordered sequence of musical events for each track and several tracks into a single sequence.
This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
arXiv Detail & Related papers (2020-08-13T02:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.