Related papers: BERT-like Pre-training for Symbolic Piano Music Classification Tasks

BERT-like Pre-training for Symbolic Piano Music Classification Tasks

URL: http://arxiv.org/abs/2107.05223v2
Date: Sun, 14 Apr 2024 03:40:35 GMT
Title: BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Authors: Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang,
Abstract summary: This article presents a benchmark study of symbolic piano music classification using the Bidirectional Representations from Transformers (BERT) approach. We pre-train two 12-layer Transformer models using the BERT approach and fine-tune them for four downstream classification tasks. Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
Score: 15.02723006489356
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.

Related papers

MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling [32.78044321881271]
We propose MIDI-VALLE, a neural language model adapted from the VALLE framework for personalised text-to-speech synthesis.<n>VALLE encodes both MIDI and audio as discrete tokens, facilitating a more consistent and robust modelling of piano performances.<n> Evaluation results show that MIDI-VALLE significantly outperforms a state-of-the-art baseline.
arXiv Detail & Related papers (2025-07-11T12:28:20Z)
Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z)
Fine-Tuning MIDI-to-Audio Alignment using a Neural Network on Piano Roll and CQT Representations [2.3249139042158853]
We present a neural network approach for synchronizing audio recordings of human piano performances with their corresponding loosely aligned MIDI files.<n>The proposed model achieves up to 20% higher alignment accuracy than the industry-standard Dynamic Time Warping (DTW) method.
arXiv Detail & Related papers (2025-06-27T13:59:50Z)
The GigaMIDI Dataset with Features for Expressive Music Performance Detection [5.585625844344932]
The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. This curated iteration of GigaMIDI encompasses expressively-performed instrument tracks detected by NOMML, constituting 31% of the GigaMIDI dataset.
arXiv Detail & Related papers (2025-02-24T23:39:40Z)
End-to-end Piano Performance-MIDI to Score Conversion with Transformers [26.900974153235456]
We present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files. We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data. Our method is also the first to directly predict notational details like trill marks or stem direction from performance data.
arXiv Detail & Related papers (2024-09-30T20:11:37Z)
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody [61.147446955297625]
Text-to-song (TTSong) is a music generation task that synthesizes accompanied singing voices. We present MelodyLM, the first TTSong model that generates high-quality song pieces with fully text-controlled melodies.
arXiv Detail & Related papers (2024-07-02T08:23:38Z)
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types. We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input. In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z)
Large-Scale MIDI-based Composer Classification [13.815200249190529]
We propose large-scale MIDI based composer classification systems using GiantMIDI-Piano. We are the first to investigate the composer classification problem with up to 100 composers. Our system achieves a 10-composer and a 100-composer classification accuracies of 0.648 and 0.385.
arXiv Detail & Related papers (2020-10-28T08:07:55Z)
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation [0.0]
The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis. The experiment results show our model ranks $3rd$ in all the $7$ teams in the data challenge in CSMT( 2020)
arXiv Detail & Related papers (2020-10-15T13:59:58Z)
Deep Composer Classification Using Symbolic Representation [6.656753488329095]
In this study, we train deep neural networks to classify composer on a symbolic domain. The model takes a two-channel two-dimensional input, which is converted from MIDI recordings and performs a single-label classification. On the experiments conducted on MAESTRO dataset, we report an F1 value of 0.8333 for the classification of 13classical composers.
arXiv Detail & Related papers (2020-10-02T07:40:44Z)
PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence. MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling. We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z)
Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated. We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining [16.23438816698455]
We recast the problem to be based on raw sheet music images rather than a symbolic music format. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP.
arXiv Detail & Related papers (2020-07-29T04:13:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.