Multi-Task Self-Supervised Pre-Training for Music Classification
- URL: http://arxiv.org/abs/2102.03229v1
- Date: Fri, 5 Feb 2021 15:19:58 GMT
- Title: Multi-Task Self-Supervised Pre-Training for Music Classification
- Authors: Ho-Hsiang Wu, Chieh-Chi Kao, Qingming Tang, Ming Sun, Brian McFee,
Juan Pablo Bello, Chao Wang
- Abstract summary: We apply self-supervised and multi-task learning methods for pre-training music encoders.
We investigate how these design choices interact with various downstream music classification tasks.
- Score: 36.21650132145048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning is very data hungry, and supervised learning especially
requires massive labeled data to work well. Machine listening research often
suffers from limited labeled data problem, as human annotations are costly to
acquire, and annotations for audio are time consuming and less intuitive.
Besides, models learned from labeled dataset often embed biases specific to
that particular dataset. Therefore, unsupervised learning techniques become
popular approaches in solving machine listening problems. Particularly, a
self-supervised learning technique utilizing reconstructions of multiple
hand-crafted audio features has shown promising results when it is applied to
speech domain such as emotion recognition and automatic speech recognition
(ASR). In this paper, we apply self-supervised and multi-task learning methods
for pre-training music encoders, and explore various design choices including
encoder architectures, weighting mechanisms to combine losses from multiple
tasks, and worker selections of pretext tasks. We investigate how these design
choices interact with various downstream music classification tasks. We find
that using various music specific workers altogether with weighting mechanisms
to balance the losses during pre-training helps improve and generalize to the
downstream tasks.
Related papers
- An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging [6.363158395541767]
Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data.
In this study, we investigate and compare the performance of new self-supervised methods for music tagging.
arXiv Detail & Related papers (2024-04-14T07:56:08Z) - Exploring Memorization in Fine-tuned Language Models [53.52403444655213]
We conduct the first comprehensive analysis to explore language models' memorization during fine-tuning across tasks.
Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that memorization presents a strong disparity among different fine-tuning tasks.
We provide an intuitive explanation of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution.
arXiv Detail & Related papers (2023-10-10T15:41:26Z) - Self-Supervised Learning for Audio-Based Emotion Recognition [1.7598252755538808]
Self-supervised learning is a family of methods which can learn despite a scarcity of supervised labels.
We have applied self-supervised learning pre-training to the classification of emotions from the CMU- MOSEI's acoustic modality.
We find that self-supervised learning consistently improves the performance of the model across all metrics.
arXiv Detail & Related papers (2023-07-23T14:40:50Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - Music Instrument Classification Reprogrammed [79.68916470119743]
"Reprogramming" is a technique that utilizes pre-trained deep and complex neural networks originally targeting a different task by modifying and mapping both the input and output of the pre-trained model.
We demonstrate that reprogramming can effectively leverage the power of the representation learned for a different task and that the resulting reprogrammed system can perform on par or even outperform state-of-the-art systems at a fraction of training parameters.
arXiv Detail & Related papers (2022-11-15T18:26:01Z) - Supervised and Unsupervised Learning of Audio Representations for Music
Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z) - A Study of Few-Shot Audio Classification [2.1989764549743476]
Few-shot learning is a type of machine learning designed to enable the model to generalize to new classes with very few examples.
We evaluate our model for speaker identification on the VoxCeleb dataset and ICSI Meeting Corpus, obtaining 5-shot 5-way accuracies of 93.5% and 54.0%, respectively.
We also evaluate for activity classification from audio using few-shot subsets of the Kinetics600 dataset and AudioSet, both drawn from Youtube videos, obtaining 51.5% and 35.2% accuracy, respectively.
arXiv Detail & Related papers (2020-12-02T22:19:16Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.