Parameter-Efficient Transfer Learning for Music Foundation Models
- URL: http://arxiv.org/abs/2411.19371v1
- Date: Thu, 28 Nov 2024 20:50:40 GMT
- Title: Parameter-Efficient Transfer Learning for Music Foundation Models
- Authors: Yiwei Ding, Alexander Lerch,
- Abstract summary: We investigate the use of parameter-efficient transfer learning (PETL) for music foundation models.
PETL methods outperform both probing and fine-tuning on music auto-tagging.
PETL methods achieve similar results as fine-tuning with significantly less training cost.
- Score: 51.61531917413708
- License:
- Abstract: More music foundation models are recently being released, promising a general, mostly task independent encoding of musical information. Common ways of adapting music foundation models to downstream tasks are probing and fine-tuning. These common transfer learning approaches, however, face challenges. Probing might lead to suboptimal performance because the pre-trained weights are frozen, while fine-tuning is computationally expensive and is prone to overfitting. Our work investigates the use of parameter-efficient transfer learning (PETL) for music foundation models which integrates the advantage of probing and fine-tuning. We introduce three types of PETL methods: adapter-based methods, prompt-based methods, and reparameterization-based methods. These methods train only a small number of parameters, and therefore do not require significant computational resources. Results show that PETL methods outperform both probing and fine-tuning on music auto-tagging. On key detection and tempo estimation, they achieve similar results as fine-tuning with significantly less training cost. However, the usefulness of the current generation of foundation model on key and tempo tasks is questioned by the similar results achieved by training a small model from scratch. Code available at https://github.com/suncerock/peft-music/
Related papers
- Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting [15.251425165987987]
Fine-tuning a pre-trained model on a downstream task often degrades its original capabilities.
We propose a sample weighting scheme for the fine-tuning data based on the pre-trained model's losses.
We empirically demonstrate the efficacy of our method on both language and vision tasks.
arXiv Detail & Related papers (2025-02-05T00:49:59Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Fine-tuning can cripple your foundation model; preserving features may be the solution [87.35911633187204]
A fine-tuned model's ability to recognize concepts on tasks is reduced significantly compared to its pre-trained counterpart.
We propose a new fine-tuning method called $textitLDIFS$ that, while learning new concepts related to the downstream task, allows a model to preserve its pre-trained knowledge as well.
arXiv Detail & Related papers (2023-08-25T11:49:51Z) - Strong Baselines for Parameter Efficient Few-Shot Fine-tuning [50.83426196335385]
Few-shot classification (FSC) entails learning novel classes given only a few examples per class after a pre-training (or meta-training) phase.
Recent works have shown that simply fine-tuning a pre-trained Vision Transformer (ViT) on new test classes is a strong approach for FSC.
Fine-tuning ViTs, however, is expensive in time, compute and storage.
This has motivated the design of parameter efficient fine-tuning (PEFT) methods which fine-tune only a fraction of the Transformer's parameters.
arXiv Detail & Related papers (2023-04-04T16:14:39Z) - Resource-Efficient Transfer Learning From Speech Foundation Model Using
Hierarchical Feature Fusion [44.056153052137674]
We propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models.
Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms.
arXiv Detail & Related papers (2022-11-04T19:03:45Z) - Robust Few-shot Learning Without Using any Adversarial Samples [19.34427461937382]
A few efforts have been made to combine the few-shot problem with the robustness objective using sophisticated Meta-Learning techniques.
We propose a simple but effective alternative that does not require any adversarial samples.
Inspired by the cognitive decision-making process in humans, we enforce high-level feature matching between the base class data and their corresponding low-frequency samples.
arXiv Detail & Related papers (2022-11-03T05:58:26Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Few-Shot Lifelong Learning [35.05196800623617]
Few-Shot Lifelong Learning enables deep learning models to perform lifelong/continual learning on few-shot data.
Our method selects very few parameters from the model for training every new set of classes instead of training the full model.
We experimentally show that our method significantly outperforms existing methods on the miniImageNet, CIFAR-100, and CUB-200 datasets.
arXiv Detail & Related papers (2021-03-01T13:26:57Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.