A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention
Mechanism for Symbolic Music Modeling
- URL: http://arxiv.org/abs/2212.00973v1
- Date: Fri, 2 Dec 2022 05:04:31 GMT
- Title: A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention
Mechanism for Symbolic Music Modeling
- Authors: Z. Guo, J. Kang, D. Herremans
- Abstract summary: We propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding.
Taking advantage of the proposed FME, we propose a novel attention mechanism based on the relative index, pitch and onset embeddings.
Experiment results show that our proposed model: RIPO transformer outperforms the state-of-the-art transformers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Following the success of the transformer architecture in the natural language
domain, transformer-like architectures have been widely applied to the domain
of symbolic music recently. Symbolic music and text, however, are two different
modalities. Symbolic music contains multiple attributes, both absolute
attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These
relative attributes shape human perception of musical motifs. These important
relative attributes, however, are mostly ignored in existing symbolic music
modeling methods with the main reason being the lack of a musically-meaningful
embedding space where both the absolute and relative embeddings of the symbolic
music tokens can be efficiently represented. In this paper, we propose the
Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted
sinusoidal encoding within which both the absolute and the relative attributes
can be embedded and the fundamental musical properties (e.g., translational
invariance) are explicitly preserved. Taking advantage of the proposed FME, we
further propose a novel attention mechanism based on the relative index, pitch
and onset embeddings (RIPO attention) such that the musical domain knowledge
can be fully utilized for symbolic music modeling. Experiment results show that
our proposed model: RIPO transformer which utilizes FME and RIPO attention
outperforms the state-of-the-art transformers (i.e., music transformer, linear
transformer) in a melody completion task. Moreover, using the RIPO transformer
in a downstream music generation task, we notice that the notorious
degeneration phenomenon no longer exists and the music generated by the RIPO
transformer outperforms the music generated by state-of-the-art transformer
models in both subjective and objective evaluations.
Related papers
- Music102: An $D_{12}$-equivariant transformer for chord progression accompaniment [0.0]
Music102 enhances chord progression accompaniment through a D12-equivariant transformer.
By encoding prior music knowledge, the model maintains equivariance across both melody and chord sequences.
This work showcases the adaptability of self-attention mechanisms and layer normalization to the discrete musical domain.
arXiv Detail & Related papers (2024-10-23T03:11:01Z) - MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT [44.204383306879095]
We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation.
To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss.
arXiv Detail & Related papers (2024-09-02T03:18:56Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - The Power of Reuse: A Multi-Scale Transformer Model for Structural
Dynamic Segmentation in Symbolic Music Generation [6.0949335132843965]
Symbolic Music Generation relies on the contextual representation capabilities of the generative model.
We propose a multi-scale Transformer, which uses coarse-decoder and fine-decoders to model the contexts at the global and section-level.
Our model is evaluated on two open MIDI datasets, and experiments show that our model outperforms the best contemporary symbolic music generative models.
arXiv Detail & Related papers (2022-05-17T18:48:14Z) - Signal-domain representation of symbolic music for learning embedding
spaces [2.28438857884398]
We introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.
We show that our signal-like representation leads to better reconstruction and disentangled features.
arXiv Detail & Related papers (2021-09-08T06:36:02Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.