BERT-like Pre-training for Symbolic Piano Music Classification Tasks
- URL: http://arxiv.org/abs/2107.05223v2
- Date: Sun, 14 Apr 2024 03:40:35 GMT
- Title: BERT-like Pre-training for Symbolic Piano Music Classification Tasks
- Authors: Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang,
- Abstract summary: This article presents a benchmark study of symbolic piano music classification using the Bidirectional Representations from Transformers (BERT) approach.
We pre-train two 12-layer Transformer models using the BERT approach and fine-tune them for four downstream classification tasks.
Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
- Score: 15.02723006489356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
Related papers
- End-to-end Piano Performance-MIDI to Score Conversion with Transformers [26.900974153235456]
We present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files.
We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data.
Our method is also the first to directly predict notational details like trill marks or stem direction from performance data.
arXiv Detail & Related papers (2024-09-30T20:11:37Z) - Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices.
We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Large-Scale MIDI-based Composer Classification [13.815200249190529]
We propose large-scale MIDI based composer classification systems using GiantMIDI-Piano.
We are the first to investigate the composer classification problem with up to 100 composers.
Our system achieves a 10-composer and a 100-composer classification accuracies of 0.648 and 0.385.
arXiv Detail & Related papers (2020-10-28T08:07:55Z) - A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation [0.0]
The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis.
The experiment results show our model ranks $3rd$ in all the $7$ teams in the data challenge in CSMT( 2020)
arXiv Detail & Related papers (2020-10-15T13:59:58Z) - Deep Composer Classification Using Symbolic Representation [6.656753488329095]
In this study, we train deep neural networks to classify composer on a symbolic domain.
The model takes a two-channel two-dimensional input, which is converted from MIDI recordings and performs a single-label classification.
On the experiments conducted on MAESTRO dataset, we report an F1 value of 0.8333 for the classification of 13classical composers.
arXiv Detail & Related papers (2020-10-02T07:40:44Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z) - Composer Style Classification of Piano Sheet Music Images Using Language
Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format.
Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation.
We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.