Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions
- URL: http://arxiv.org/abs/2002.00212v3
- Date: Mon, 10 Aug 2020 07:27:05 GMT
- Title: Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions
- Authors: Yu-Siang Huang, Yi-Hsuan Yang
- Abstract summary: We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.
In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
- Score: 37.66340344198797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A great number of deep learning based models have been recently proposed for
automatic music composition. Among these models, the Transformer stands out as
a prominent approach for generating expressive classical piano performance with
a coherent structure of up to one minute. The model is powerful in that it
learns abstractions of data on its own, without much human-imposed domain
knowledge or constraints. In contrast with this general approach, this paper
shows that Transformers can do even better for music modeling, when we improve
the way a musical score is converted into the data fed to a Transformer model.
In particular, we seek to impose a metrical structure in the input data, so
that Transformers can be more easily aware of the beat-bar-phrase hierarchical
structure in music. The new data representation maintains the flexibility of
local tempo changes, and provides hurdles to control the rhythmic and harmonic
structure of music. With this approach, we build a Pop Music Transformer that
composes Pop piano music with better rhythmic structure than existing
Transformer models.
Related papers
- MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens.
We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z) - Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods.
intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z) - Melody Infilling with User-Provided Structural Context [37.55332319528369]
This paper proposes a novel Transformer-based model for music score infilling.
We show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality.
arXiv Detail & Related papers (2022-10-06T11:37:04Z) - Compose & Embellish: Well-Structured Piano Performance Generation via A
Two-Stage Approach [36.49582705724548]
We devise a two-stage Transformer-based framework that Composes a lead sheet first, and then Embellishes it with accompaniment and expressive touches.
Our objective and subjective experiments show that Compose & Embellish shrinks the gap in structureness between a current state of the art and real performances by half, and improves other musical aspects such as richness and coherence as well.
arXiv Detail & Related papers (2022-09-17T01:20:59Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Calliope -- A Polyphonic Music Transformer [9.558051115598657]
We present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music.
Experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation.
arXiv Detail & Related papers (2021-07-08T08:18:57Z) - Parameter Efficient Multimodal Transformers for Video Representation
Learning [108.8517364784009]
This work focuses on reducing the parameters of multimodal Transformers in the context of audio-visual video representation learning.
We show that our approach reduces parameters up to 80$%$, allowing us to train our model end-to-end from scratch.
To demonstrate our approach, we pretrain our model on 30-second clips from Kinetics-700 and transfer it to audio-visual classification tasks.
arXiv Detail & Related papers (2020-12-08T00:16:13Z) - Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning
Subword Systems [78.80826533405019]
We show that we can obtain a neural machine translation model that works at the character level without requiring token segmentation.
Our study is a significant step towards high-performance and easy to train character-based models that are not extremely large.
arXiv Detail & Related papers (2020-04-29T15:56:02Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.