Related papers: PopMAG: Pop Music Accompaniment Generation

PopMAG: Pop Music Accompaniment Generation

URL: http://arxiv.org/abs/2008.07703v1
Date: Tue, 18 Aug 2020 02:28:36 GMT
Title: PopMAG: Pop Music Accompaniment Generation
Authors: Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu
Abstract summary: We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence. MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling. We call our system for pop music accompaniment generation as PopMAG.
Score: 190.09996798215738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In pop music, accompaniments are usually played by multiple instruments (tracks) such as drum, bass, string and guitar, and can make a song more expressive and contagious by arranging together with its melody. Previous works usually generate multiple tracks separately and the music notes from different tracks not explicitly depend on each other, which hurts the harmony modeling. To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks. While this greatly improves harmony, unfortunately, it enlarges the sequence length and brings the new challenge of long-term music modeling. We further introduce two new techniques to address this challenge: 1) We model multiple note attributes (e.g., pitch, duration, velocity) of a musical note in one step instead of multiple steps, which can shorten the length of a MuMIDI sequence. 2) We introduce extra long-context as memory to capture long-term dependency in music. We call our system for pop music accompaniment generation as PopMAG. We evaluate PopMAG on multiple datasets (LMD, FreeMidi and CPMD, a private dataset of Chinese pop songs) with both subjective and objective metrics. The results demonstrate the effectiveness of PopMAG for multi-track harmony modeling and long-term context modeling. Specifically, PopMAG wins 42\%/38\%/40\% votes when comparing with ground truth musical pieces on LMD, FreeMidi and CPMD datasets respectively and largely outperforms other state-of-the-art music accompaniment generation models and multi-track MIDI representations in terms of subjective and objective metrics.

Related papers

LeVo: High-Quality Song Generation with Multi-Preference Alignment [49.94713419553945]
We introduce LeVo, an LM-based framework consisting of LeLM and a music accompaniment.<n>LeVo is capable of parallelly modeling two types of tokens: mixed tokens, which represent the combined audio of vocals and to achieve vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment.<n> Experimental results demonstrate that LeVo consistently outperforms existing methods on both objective and subjective metrics.
arXiv Detail & Related papers (2025-06-09T07:57:24Z)
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition [4.152843247686306]
MIDI-GPT is a generative system designed for computer-assisted music composition. It supports the infilling of musical material at the track and bar level, and can condition generation on attributes including instrument type, musical style, note density, polyphony level, and note duration. We present experimental results that demonstrate that MIDI-GPT is able to consistently avoid duplicating the musical material it was trained on, generate music that is stylistically similar to the training dataset, and that attribute controls allow enforcing various constraints on the generated material.
arXiv Detail & Related papers (2025-01-28T15:17:36Z)
UniMuMo: Unified Text, Music and Motion Generation [57.72514622935806]
We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture.
arXiv Detail & Related papers (2024-10-06T16:04:05Z)
BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features. The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed. The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z)
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices. We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z)
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation [39.892059799407434]
MelodyGLM is a multi-task pre-training framework for generating melodies with long-term structure. We have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces.
arXiv Detail & Related papers (2023-09-19T16:34:24Z)
Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length. Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z)
Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z)
MMM : Exploring Conditional Multi-Track Music Generation with the Transformer [9.569049935824227]
We propose a generative system based on the Transformer architecture that is capable of generating multi-track music. We create a time-ordered sequence of musical events for each track and several tracks into a single sequence. This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
arXiv Detail & Related papers (2020-08-13T02:36:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.