Contrastive Learning of Musical Representations
- URL: http://arxiv.org/abs/2103.09410v1
- Date: Wed, 17 Mar 2021 02:53:55 GMT
- Title: Contrastive Learning of Musical Representations
- Authors: Janne Spijkervet, John Ashley Burgoyne
- Abstract summary: We introduce SimCLR to the music domain to form a framework for self-supervised learning of raw waveforms of music: CLMR.
We show that CLMR's representations are transferable using out-of-domain datasets, indicating that they capture important musical knowledge.
To foster and future research on self-supervised learning in music, we publicly release the pre-trained models and the source code of all experiments of this paper on GitHub.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While supervised learning has enabled great advances in many areas of music,
labeled music datasets remain especially hard, expensive and time-consuming to
create. In this work, we introduce SimCLR to the music domain and contribute a
large chain of audio data augmentations, to form a simple framework for
self-supervised learning of raw waveforms of music: CLMR. This approach
requires no manual labeling and no preprocessing of music to learn useful
representations. We evaluate CLMR in the downstream task of music
classification on the MagnaTagATune and Million Song datasets. A linear
classifier fine-tuned on representations from a pre-trained CLMR model achieves
an average precision of 35.4% on the MagnaTagATune dataset, superseding fully
supervised models that currently achieve a score of 34.9%. Moreover, we show
that CLMR's representations are transferable using out-of-domain datasets,
indicating that they capture important musical knowledge. Lastly, we show that
self-supervised pre-training allows us to learn efficiently on smaller labeled
datasets: we still achieve a score of 33.1% despite using only 259 labeled
songs during fine-tuning. To foster reproducibility and future research on
self-supervised learning in music, we publicly release the pre-trained models
and the source code of all experiments of this paper on GitHub.
Related papers
- MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Supervised and Unsupervised Learning of Audio Representations for Music
Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z) - Unaligned Supervision For Automatic Music Transcription in The Wild [1.2183405753834562]
NoteEM is a method for simultaneously training a transcriber and aligning the scores to their corresponding performances.
We report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations.
arXiv Detail & Related papers (2022-04-28T17:31:43Z) - Multi-task Learning with Metadata for Music Mood Classification [0.0]
Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation.
We propose a multi-task learning approach in which a shared model is simultaneously trained for mood and metadata prediction tasks.
Applying our technique on the existing state-of-the-art convolutional neural networks for mood classification improves their performances consistently.
arXiv Detail & Related papers (2021-10-10T11:36:34Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.