A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
- URL: http://arxiv.org/abs/2010.07758v3
- Date: Mon, 1 Feb 2021 05:06:55 GMT
- Title: A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
- Authors: Mingshuo Ding, Yinghao Ma
- Abstract summary: The aim is to obtain a model that can suggest the probability a MIDI clip might be composed condition on the auto-generation hypothesis.
The experiment results show our model ranks $3rd$ in all the $7$ teams in the data challenge in CSMT( 2020)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent achievements of deep learning automatic music generation
algorithms, few approaches have been proposed to evaluate whether a
single-track music excerpt is composed by automatons or Homo sapiens. To tackle
this problem, we apply a masked language model based on ALBERT for composers
classification. The aim is to obtain a model that can suggest the probability a
MIDI clip might be composed condition on the auto-generation hypothesis, and
which is trained with only AI-composed single-track MIDI. In this paper, the
amount of parameters is reduced, two methods on data augmentation are proposed
as well as a refined loss function to prevent overfitting. The experiment
results show our model ranks $3^{rd}$ in all the $7$ teams in the data
challenge in CSMT(2020). Furthermore, this inspiring method could be spread to
other music information retrieval tasks that are based on a small dataset.
Related papers
- SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation.
It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately.
To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z) - Detecting Music Performance Errors with Transformers [3.6837762419929168]
Existing tools for music error detection rely on automatic alignment.
There is a lack of sufficient data to train music error detection models.
We present a novel data generation technique capable of creating large-scale synthetic music error datasets.
arXiv Detail & Related papers (2025-01-03T07:04:20Z) - Parameter-Efficient Transfer Learning for Music Foundation Models [51.61531917413708]
We investigate the use of parameter-efficient transfer learning (PETL) for music foundation models.
PETL methods outperform both probing and fine-tuning on music auto-tagging.
PETL methods achieve similar results as fine-tuning with significantly less training cost.
arXiv Detail & Related papers (2024-11-28T20:50:40Z) - Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance [0.8192907805418583]
Notochord is a deep probabilistic model for sequences of structured events.
It can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds.
arXiv Detail & Related papers (2024-03-18T17:35:02Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine
Learning Research [2.9697051524971743]
The jazznet dataset contains 162520 labeled piano patterns, including chords, arpeggios, scales, and chord progressions with their inversions.
The paper explains the dataset's composition, creation, and generation, and presents an open-source Pattern Generator.
We demonstrate that the dataset can help researchers benchmark new models for challenging MIR tasks, using a convolutional recurrent neural network (CRNN) and a deep convolutional neural network.
arXiv Detail & Related papers (2023-02-17T00:13:22Z) - Melody transcription via generative pre-training [86.08508957229348]
Key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles.
To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio.
We derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music.
arXiv Detail & Related papers (2022-12-04T18:09:23Z) - BERT-like Pre-training for Symbolic Piano Music Classification Tasks [15.02723006489356]
This article presents a benchmark study of symbolic piano music classification using the Bidirectional Representations from Transformers (BERT) approach.
We pre-train two 12-layer Transformer models using the BERT approach and fine-tune them for four downstream classification tasks.
Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
arXiv Detail & Related papers (2021-07-12T07:03:57Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.