A Study of Transfer Learning in Music Source Separation
- URL: http://arxiv.org/abs/2010.12650v1
- Date: Fri, 23 Oct 2020 20:29:47 GMT
- Title: A Study of Transfer Learning in Music Source Separation
- Authors: Andreas Bugler, Bryan Pardo, Prem Seetharaman
- Abstract summary: It is well known that transferring learning from related domains can result in a performance boost for deep learning systems.
In this work we investigate the effectiveness of data augmentation during pretraining.
We also explore how much of a model must be retrained on the final target task, once pretrained.
- Score: 12.819592416207728
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Supervised deep learning methods for performing audio source separation can
be very effective in domains where there is a large amount of training data.
While some music domains have enough data suitable for training a separation
system, such as rock and pop genres, many musical domains do not, such as
classical music, choral music, and non-Western music traditions. It is well
known that transferring learning from related domains can result in a
performance boost for deep learning systems, but it is not always clear how
best to do pretraining. In this work we investigate the effectiveness of data
augmentation during pretraining, the impact on performance as a result of
pretraining and downstream datasets having similar content domains, and also
explore how much of a model must be retrained on the final target task, once
pretrained.
Related papers
- Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning [50.80758278865274]
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization.
The order in which the data from these domains is used for training can significantly affect the model's performance on each domain.
We investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields.
arXiv Detail & Related papers (2025-01-26T15:12:06Z) - Self-Train Before You Transcribe [3.17829719401032]
We investigate the benefit of performing noisy student teacher training on recordings in the test set as a test-time adaptation approach.
A range of in-domain and out-of-domain datasets are used for experiments demonstrating large relative gains of up to 32.2%.
arXiv Detail & Related papers (2024-06-17T09:21:00Z) - An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging [6.363158395541767]
Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data.
In this study, we investigate and compare the performance of new self-supervised methods for music tagging.
arXiv Detail & Related papers (2024-04-14T07:56:08Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Self-Supervised Contrastive Learning for Robust Audio-Sheet Music
Retrieval Systems [3.997809845676912]
We show that self-supervised contrastive learning can mitigate the scarcity of annotated data from real music content.
We employ the snippet embeddings in the higher-level task of cross-modal piece identification.
In this work, we observe that the retrieval quality improves from 30% up to 100% when real music data is present.
arXiv Detail & Related papers (2023-09-21T14:54:48Z) - Supervised and Unsupervised Learning of Audio Representations for Music
Understanding [9.239657838690226]
We show how the domain of pre-training datasets affects the adequacy of the resulting audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-10-07T20:07:35Z) - Learning music audio representations via weak language supervision [14.335950077921435]
We design a multimodal architecture for music and language pre-training (MuLaP) optimised via a set of proxy tasks.
weak supervision is provided in the form of noisy natural language descriptions conveying the overall musical content of the track.
We demonstrate the usefulness of our approach by comparing the performance of audio representations produced by the same audio backbone with different training strategies.
arXiv Detail & Related papers (2021-12-08T10:30:52Z) - Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised
Pre-Training [67.71228426496013]
We show that using target domain data during pre-training leads to large performance improvements across a variety of setups.
We find that pre-training on multiple domains improves performance generalization on domains not seen during training.
arXiv Detail & Related papers (2021-04-02T12:53:15Z) - Incorporating Music Knowledge in Continual Dataset Augmentation for
Music Generation [69.06413031969674]
Aug-Gen is a method of dataset augmentation for any music generation system trained on a resource-constrained domain.
We apply Aug-Gen to Transformer-based chorale generation in the style of J.S. Bach, and show that this allows for longer training and results in better generative output.
arXiv Detail & Related papers (2020-06-23T21:06:15Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z) - Are we pretraining it right? Digging deeper into visio-linguistic
pretraining [61.80511482405592]
We study how varying similarity between the pretraining dataset domain (textual and visual) and the downstream domain affects performance.
Surprisingly, we show that automatically generated data in a domain closer to the downstream task is a better choice for pretraining than "natural" data.
This suggests that despite the numerous recent efforts, vision & language pretraining does not quite work "out of the box" yet.
arXiv Detail & Related papers (2020-04-19T01:55:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.