Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers
- URL: http://arxiv.org/abs/2410.10515v1
- Date: Mon, 14 Oct 2024 13:53:11 GMT
- Title: Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers
- Authors: Gabriel Souza, Flavio Figueiredo, Alexei Machado, Deborah GuimarĂ£es,
- Abstract summary: In this work, we inquire if the off-the-shelf Music Transformer models perform just as well on structural similarity metrics using only unannotated MIDI information.
We show that a slight tweak to the most common representation yields small but significant improvements.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, deep learning has achieved formidable results in creative computing. When it comes to music, one viable model for music generation are Transformer based models. However, while transformers models are popular for music generation, they often rely on annotated structural information. In this work, we inquire if the off-the-shelf Music Transformer models perform just as well on structural similarity metrics using only unannotated MIDI information. We show that a slight tweak to the most common representation yields small but significant improvements. We also advocate that searching for better unannotated musical representations is more cost-effective than producing large amounts of curated and annotated data.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Choir Transformer: Generating Polyphonic Music with Relative Attention
on Transformer [4.866650264773479]
We propose a polyphonic music generation neural network named Choir Transformer.
The performance of Choir Transformer surpasses the previous state-of-the-art accuracy of 4.06%.
In practical application, the generated melody and rhythm can be adjusted according to the specified input.
arXiv Detail & Related papers (2023-08-01T06:44:15Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles [65.54857068975068]
In this paper, we argue that this additional bulk is unnecessary.
By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer.
We create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models.
arXiv Detail & Related papers (2023-06-01T17:59:58Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Calliope -- A Polyphonic Music Transformer [9.558051115598657]
We present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music.
Experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation.
arXiv Detail & Related papers (2021-07-08T08:18:57Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Synthesizer: Rethinking Self-Attention in Transformer Models [93.08171885200922]
dot product self-attention is central and indispensable to state-of-the-art Transformer models.
This paper investigates the true importance and contribution of the dot product-based self-attention mechanism on the performance of Transformer models.
arXiv Detail & Related papers (2020-05-02T08:16:19Z) - Pop Music Transformer: Beat-based Modeling and Generation of Expressive
Pop Piano Compositions [37.66340344198797]
We build a Pop Music Transformer that composes Pop piano music with better rhythmic structure than existing Transformer models.
In particular, we seek to impose a metrical structure in the input data, so that Transformers can be more easily aware of the beat-bar-phrase hierarchical structure in music.
arXiv Detail & Related papers (2020-02-01T14:12:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.