Symphony Generation with Permutation Invariant Language Model
- URL: http://arxiv.org/abs/2205.05448v1
- Date: Tue, 10 May 2022 13:08:49 GMT
- Title: Symphony Generation with Permutation Invariant Language Model
- Authors: Jiafeng Liu, Yuanliang Dong, Zehua Cheng, Xinran Zhang, Xiaobing Li,
Feng Yu and Maosong Sun
- Abstract summary: We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
- Score: 57.75739773758614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a symbolic symphony music generation solution,
SymphonyNet, based on a permutation invariant language model. To bridge the gap
between text generation and symphony generation task, we propose a novel
Multi-track Multi-instrument Repeatable (MMR) representation with particular
3-D positional embedding and a modified Byte Pair Encoding algorithm (Music
BPE) for music tokens. A novel linear transformer decoder architecture is
introduced as a backbone for modeling extra-long sequences of symphony tokens.
Meanwhile, we train the decoder to learn automatic orchestration as a joint
task by masking instrument information from the input. We also introduce a
large-scale symbolic symphony dataset for the advance of symphony generation
research. Our empirical results show that our proposed approach can generate
coherent, novel, complex and harmonious symphony compared to human composition,
which is the pioneer solution for multi-track multi-instrument symbolic music
generation.
Related papers
- UniMuMo: Unified Text, Music and Motion Generation [57.72514622935806]
We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.
By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture.
arXiv Detail & Related papers (2024-10-06T16:04:05Z) - PerTok: Expressive Encoding and Modeling of Symbolic Musical Ideas and Variations [0.3683202928838613]
Cadenza is a new multi-stage generative framework for predicting expressive variations of symbolic musical ideas.
The proposed framework comprises of two sequential stages: 1) Composer and 2) Performer.
Our framework is designed, researched and implemented with the objective of providing inspiration for musicians.
arXiv Detail & Related papers (2024-10-02T22:11:31Z) - YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation [15.9795868183084]
Multi-instrument music transcription aims to convert polyphonic music recordings into musical scores assigned to each instrument.
This paper introduces YourMT3+, a suite of models for enhanced multi-instrument music transcription.
Our experiments demonstrate direct vocal transcription capabilities, eliminating the need for voice separation pre-processors.
arXiv Detail & Related papers (2024-07-05T19:18:33Z) - C3LLM: Conditional Multimodal Content Generation Using Large Language Models [66.11184017840688]
We introduce C3LLM, a novel framework combining three tasks of video-to-audio, audio-to-text, and text-to-audio together.
C3LLM adapts the Large Language Model (LLM) structure as a bridge for aligning different modalities.
Our method combines the previous tasks of audio understanding, video-to-audio generation, and text-to-audio generation together into one unified model.
arXiv Detail & Related papers (2024-05-25T09:10:12Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - From Words to Music: A Study of Subword Tokenization Techniques in
Symbolic Music Generation [1.9188864062289432]
Subword tokenization has been widely successful in text-based natural language processing tasks with Transformer-based models.
We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time.
Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition.
arXiv Detail & Related papers (2023-04-18T12:46:12Z) - An Empirical Evaluation of End-to-End Polyphonic Optical Music
Recognition [24.377724078096144]
Piano and orchestral scores frequently exhibit polyphonic passages, which add a second dimension to the task.
We propose two novel formulations for end-to-end polyphonic OMR.
We observe a new state-of-the-art performance with our multi-sequence detection decoder, RNNDecoder.
arXiv Detail & Related papers (2021-08-03T22:04:40Z) - Spectrogram Inpainting for Interactive Generation of Instrument Sounds [1.7205106391379026]
We cast the generation of individual instrumental notes as an inpainting-based task, introducing novel and unique ways to iteratively shape sounds.
Most crucially, we open-source an interactive web interface to transform sounds by inpainting, for artists and practitioners alike, opening up to new, creative uses.
arXiv Detail & Related papers (2021-04-15T15:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.