An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
- URL: http://arxiv.org/abs/2404.09177v1
- Date: Sun, 14 Apr 2024 07:56:08 GMT
- Title: An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
- Authors: Gabriel Meseguer-Brocal, Dorian Desblancs, Romain Hennequin,
- Abstract summary: Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data.
In this study, we investigate and compare the performance of new self-supervised methods for music tagging.
- Score: 6.363158395541767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data. It is particularly compelling in the music domain, where obtaining labeled data is time-consuming, error-prone, and ambiguous. During the self-supervised process, models are trained on pretext tasks, with the primary objective of acquiring robust and informative features that can later be fine-tuned for specific downstream tasks. The choice of the pretext task is critical as it guides the model to shape the feature space with meaningful constraints for information encoding. In the context of music, most works have relied on contrastive learning or masking techniques. In this study, we expand the scope of pretext tasks applied to music by investigating and comparing the performance of new self-supervised methods for music tagging. We open-source a simple ResNet model trained on a diverse catalog of millions of tracks. Our results demonstrate that, although most of these pre-training methods result in similar downstream results, contrastive learning consistently results in better downstream performance compared to other self-supervised pre-training methods. This holds true in a limited-data downstream context.
Related papers
- Music auto-tagging in the long tail: A few-shot approach [45.873301228345696]
We propose to integrate few-shot learning methodology into multi-label music auto-tagging.
Our experiments demonstrate that a simple model with pre-trained features can achieve performance close to state-of-the-art models.
arXiv Detail & Related papers (2024-09-12T03:33:19Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Supervised and Unsupervised Learning of Audio Representations for Music
Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z) - Self-Supervised Visual Representation Learning Using Lightweight
Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine.
We critically examine the most notable pretext tasks to extract features from image data.
We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Multi-Task Self-Supervised Pre-Training for Music Classification [36.21650132145048]
We apply self-supervised and multi-task learning methods for pre-training music encoders.
We investigate how these design choices interact with various downstream music classification tasks.
arXiv Detail & Related papers (2021-02-05T15:19:58Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.