MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
- URL: http://arxiv.org/abs/2106.05630v1
- Date: Thu, 10 Jun 2021 10:13:05 GMT
- Title: MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
- Authors: Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu
- Abstract summary: Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
- Score: 97.91071692716406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic music understanding, which refers to the understanding of music from
the symbolic data (e.g., MIDI format, but not audio), covers many music
applications such as genre classification, emotion classification, and music
pieces matching. While good music representations are beneficial for these
applications, the lack of training data hinders representation learning.
Inspired by the success of pre-training models in natural language processing,
in this paper, we develop MusicBERT, a large-scale pre-trained model for music
understanding. To this end, we construct a large-scale symbolic music corpus
that contains more than 1 million music songs. Since symbolic music contains
more structural (e.g., bar, position) and diverse information (e.g., tempo,
instrument, and pitch), simply adopting the pre-training techniques from NLP to
symbolic music only brings marginal gains. Therefore, we design several
mechanisms, including OctupleMIDI encoding and bar-level masking strategy, to
enhance pre-training with symbolic music data. Experiments demonstrate the
advantages of MusicBERT on four music understanding tasks, including melody
completion, accompaniment suggestion, genre classification, and style
classification. Ablation studies also verify the effectiveness of our designs
of OctupleMIDI encoding and bar-level masking strategy in MusicBERT.
Related papers
- MMT-BERT: Chord-aware Symbolic Music Generation Based on Multitrack Music Transformer and MusicBERT [44.204383306879095]
We propose a novel symbolic music representation and Generative Adversarial Network (GAN) framework specially designed for symbolic multitrack music generation.
To build a robust multitrack music generator, we fine-tune a pre-trained MusicBERT model to serve as the discriminator, and incorporate relativistic standard loss.
arXiv Detail & Related papers (2024-09-02T03:18:56Z) - Adversarial-MidiBERT: Symbolic Music Understanding Model Based on Unbias Pre-training and Mask Fine-tuning [2.61072980439312]
We propose Adrial-MidiBERT, a symbolic music understanding model based on Biversa Representations from Transformers (BERT)
We introduce an unbiased pre-training method based on adversarial learning to minimize the participation of tokens that lead to biases during training. Furthermore, we propose a mask fine-tuning method to narrow the data gap between pre-training and fine-tuning, which can help the model converge faster and perform better.
arXiv Detail & Related papers (2024-07-11T08:54:38Z) - PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training [8.484581633133542]
PianoBART is a pre-trained model that uses BART for both symbolic piano music generation and understanding.
We devise a multi-level object selection strategy for different pre-training tasks of PianoBART, which can prevent information leakage or loss.
Experiments demonstrate that PianoBART efficiently learns musical patterns and achieves outstanding performance in generating high-quality coherent pieces.
arXiv Detail & Related papers (2024-06-26T03:35:54Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Personalized Popular Music Generation Using Imitation and Structure [1.971709238332434]
We propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song.
An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music.
arXiv Detail & Related papers (2021-05-10T23:43:00Z) - Music Embedding: A Tool for Incorporating Music Theory into
Computational Music Applications [0.3553493344868413]
It is important to digitally represent music in a music theoretic and concise manner.
Existing approaches for representing music are ineffective in terms of utilizing music theory.
arXiv Detail & Related papers (2021-04-24T04:32:45Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.