Calliope -- A Polyphonic Music Transformer
- URL: http://arxiv.org/abs/2107.05546v1
- Date: Thu, 8 Jul 2021 08:18:57 GMT
- Title: Calliope -- A Polyphonic Music Transformer
- Authors: Andrea Valenti, Stefano Berti, Davide Bacciu
- Abstract summary: We present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music.
Experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation.
- Score: 9.558051115598657
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The polyphonic nature of music makes the application of deep learning to
music modelling a challenging task. On the other hand, the Transformer
architecture seems to be a good fit for this kind of data. In this work, we
present Calliope, a novel autoencoder model based on Transformers for the
efficient modelling of multi-track sequences of polyphonic music. The
experiments show that our model is able to improve the state of the art on
musical sequence reconstruction and generation, with remarkably good results
especially on long sequences.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context.
We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture.
The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z) - Multitrack Music Transcription with a Time-Frequency Perceiver [6.617487928813374]
Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously.
We propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription.
arXiv Detail & Related papers (2023-06-19T08:58:26Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Multi-Genre Music Transformer -- Composing Full Length Musical Piece [0.0]
The objective of the project is to implement a Multi-Genre Transformer which learns to produce music pieces through more adaptive learning process.
We built a multi-genre compound word dataset, implemented a linear transformer which was trained on this dataset.
We call this Multi-Genre Transformer, which was able to generate full length new musical pieces which is diverse and comparable to original tracks.
arXiv Detail & Related papers (2023-01-06T05:27:55Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions [37.66340344198797]
We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.
In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
arXiv Detail & Related papers (2020-02-01T14:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.