Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation
- URL: http://arxiv.org/abs/2401.07532v1
- Date: Mon, 15 Jan 2024 08:41:01 GMT
- Title: Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation
- Authors: Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju,
Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng
- Abstract summary: We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
- Score: 50.365392018302416
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Variational Autoencoders (VAEs) constitute a crucial component of neural
symbolic music generation, among which some works have yielded outstanding
results and attracted considerable attention. Nevertheless, previous VAEs still
encounter issues with overly long feature sequences and generated results lack
contextual coherence, thus the challenge of modeling long multi-track symbolic
music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as
one of the pioneers in VAE methods that effectively model and generate long
multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional
(2-D) representation, OctupleMIDI, to capture relationships among notes while
reducing the feature sequences length. Moreover, we focus on instrumental
characteristics and harmony as well as global and local information about the
musical composition by employing a hybrid variational encoding-decoding
strategy to integrate both Track- and Bar-view MidiVAE features. Objective and
subjective experimental results on the CocoChorales dataset demonstrate that,
compared to the baseline, Multi-view MidiVAE exhibits significant improvements
in terms of modeling long multi-track symbolic music.
Related papers
- BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features.
The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed.
The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Multitrack Music Transformer [36.91519546327085]
We propose a new multitrack music representation that allows a diverse set of instruments while keeping a short sequence length.
Our proposed Multitrack Music Transformer (MMT) achieves comparable performance with state-of-the-art systems.
arXiv Detail & Related papers (2022-07-14T15:06:37Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just
One Transformer VAE [36.9033909878202]
Transformer and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation.
In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths.
Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.
arXiv Detail & Related papers (2021-05-10T03:44:03Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - MMM : Exploring Conditional Multi-Track Music Generation with the
Transformer [9.569049935824227]
We propose a generative system based on the Transformer architecture that is capable of generating multi-track music.
We create a time-ordered sequence of musical events for each track and several tracks into a single sequence.
This takes advantage of the Transformer's attention-mechanism, which can adeptly handle long-term dependencies.
arXiv Detail & Related papers (2020-08-13T02:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.