Related papers: Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

URL: http://arxiv.org/abs/2210.10349v1
Date: Wed, 19 Oct 2022 07:31:56 GMT
Title: Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
Authors: Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu
Abstract summary: We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures. With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
Score: 138.74751744348274
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Symbolic music generation aims to generate music scores automatically. A recent trend is to use Transformer or its variants in music generation, which is, however, suboptimal, because the full attention cannot efficiently model the typically long music sequences (e.g., over 10,000 tokens), and the existing models have shortcomings in generating musical repetition structures. In this paper, we propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation. Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures (e.g., the previous 1st, 2nd, 4th and 8th bars, selected via similarity statistics); with the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost. The advantages are two-fold. First, it can capture both music structure-related correlations via the fine-grained attention, and other contextual information via the coarse-grained attention. Second, it is efficient and can model over 3X longer music sequences compared to its full-attention counterpart. Both objective and subjective experimental results demonstrate its ability to generate long music sequences with high quality and better structures.

Related papers

Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio [7.518711227993383]
We train a general-purpose model on many translation tasks simultaneously.<n>We propose a new large-scale dataset and the tokenization of each modality.<n>Our approach achieves the first successful score-image-conditioned audio generation.
arXiv Detail & Related papers (2025-05-19T08:46:45Z)
PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation [5.201151187019607]
PerceiverS (Segmentation and Scale) is a novel architecture designed to generate long-structured and expressive music. Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details. The proposed model, evaluated on datasets like Maestro, demonstrates improvements in generating coherent and diverse music.
arXiv Detail & Related papers (2024-11-13T03:14:10Z)
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching. We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z)
Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment [0.0]
Music102 is an advanced model aimed at enhancing chord progression accompaniment through a $D_12$-equivariant transformer.<n>Inspired by group theory and symbolic music structures, Music102 leverages musical symmetry--such as transposition and reflection operations--integrating these properties into the transformer architecture.
arXiv Detail & Related papers (2024-10-23T03:11:01Z)
Do we need more complex representations for structure? A comparison of note duration representation for Music Transformers [0.0]
In this work, we inquire if the off-the-shelf Music Transformer models perform just as well on structural similarity metrics using only unannotated MIDI information. We show that a slight tweak to the most common representation yields small but significant improvements.
arXiv Detail & Related papers (2024-10-14T13:53:11Z)
Generating High-quality Symbolic Music Using Fine-grained Discriminators [42.200747558496055]
We propose to decouple the melody and rhythm from music, and design corresponding fine-grained discriminators to tackle the issues. Specifically, equipped with a pitch augmentation strategy, the melody discriminator discerns the melody variations presented by the generated samples. The rhythm discriminator, enhanced with bar-level relative positional encoding, focuses on the velocity of generated notes.
arXiv Detail & Related papers (2024-08-03T07:32:21Z)
BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features [19.284531698181116]
BandControlNet is designed to tackle the multiple music sequences and generate high-quality music samples conditioned to the giventemporal control features. The proposed BandControlNet outperforms other conditional music generation models on most objective metrics in terms of fidelity and inference speed. The subjective evaluations show trained on short datasets can generate music with comparable quality to state-of-the-art models, while outperforming significantly using BandControlNet.
arXiv Detail & Related papers (2024-07-15T06:33:25Z)
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE [36.9033909878202]
Transformer and variational autoencoders (VAE) have been extensively employed for symbolic (e.g., MIDI) domain music generation. In this paper, we are interested in bringing the two together to construct a single model that exhibits both strengths. Experiments show that MuseMorphose outperforms recurrent neural network (RNN) based prior art on numerous widely-used metrics for style transfer tasks.
arXiv Detail & Related papers (2021-05-10T03:44:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.