MERTech: Instrument Playing Technique Detection Using Self-Supervised
Pretrained Model With Multi-Task Finetuning
- URL: http://arxiv.org/abs/2310.09853v1
- Date: Sun, 15 Oct 2023 15:00:00 GMT
- Title: MERTech: Instrument Playing Technique Detection Using Self-Supervised
Pretrained Model With Multi-Task Finetuning
- Authors: Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu,
Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
- Abstract summary: We propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks.
Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets.
- Score: 17.307289537499184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instrument playing techniques (IPTs) constitute a pivotal component of
musical expression. However, the development of automatic IPT detection methods
suffers from limited labeled data and inherent class imbalance issues. In this
paper, we propose to apply a self-supervised learning model pre-trained on
large-scale unlabeled music data and finetune it on IPT detection tasks. This
approach addresses data scarcity and class imbalance challenges. Recognizing
the significance of pitch in capturing the nuances of IPTs and the importance
of onset in locating IPT events, we investigate multi-task finetuning with
pitch and onset detection as auxiliary tasks. Additionally, we apply a
post-processing approach for event-level prediction, where an IPT activation
initiates an event only if the onset output confirms an onset in that frame.
Our method outperforms prior approaches in both frame-level and event-level
metrics across multiple IPT benchmark datasets. Further experiments demonstrate
the efficacy of multi-task finetuning on each IPT class.
Related papers
- PPT: Pretraining with Pseudo-Labeled Trajectories for Motion Forecasting [90.47748423913369]
State-of-the-art motion forecasting models rely on large curated datasets with manually annotated or heavily post-processed trajectories.
PWT is a simple and scalable alternative that uses unprocessed and diverse trajectories automatically generated from off-the-shelf 3D detectors and tracking.
It achieves strong performance across standard benchmarks particularly in low-data regimes, and in cross-domain, end-to-end and multi-class settings.
arXiv Detail & Related papers (2024-12-09T13:48:15Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Rethinking Class-incremental Learning in the Era of Large Pre-trained Models via Test-Time Adaptation [20.62749699589017]
Class-incremental learning (CIL) is a challenging task that involves sequentially learning to categorize classes from new tasks.
We propose Test-Time Adaptation for Class-Incremental Learning (TTACIL) that first fine-tunes PTMs using Adapters on the first task.
Our TTACIL does not undergo any forgetting, while benefiting each task with the rich PTM features.
arXiv Detail & Related papers (2023-10-17T13:06:39Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Active Finetuning: Exploiting Annotation Budget in the
Pretraining-Finetuning Paradigm [132.9949120482274]
This paper focuses on the selection of samples for annotation in the pretraining-finetuning paradigm.
We propose a novel method called ActiveFT for active finetuning task to select a subset of data distributing similarly with the entire unlabeled pool.
Extensive experiments show the leading performance and high efficiency of ActiveFT superior to baselines on both image classification and semantic segmentation.
arXiv Detail & Related papers (2023-03-25T07:17:03Z) - Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale
Network and Self-Attention Mechanism [6.2680838592065715]
We formulate a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument.
Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention.
Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.
arXiv Detail & Related papers (2023-03-23T13:52:42Z) - How Does In-Context Learning Help Prompt Tuning? [55.78535874154915]
Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale.
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model.
Recently, Singhal et al. (2022) propose instruction prompt tuning'' (IPT), which combines PT with ICL by concatenating a natural language demonstration with learned prompt embeddings.
arXiv Detail & Related papers (2023-02-22T17:45:12Z) - Learning to Initialize: Can Meta Learning Improve Cross-task
Generalization in Prompt Tuning? [37.522581151997734]
Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, has shown remarkable performance in few-shot learning.
We study meta prompt tuning (MPT) to explore how meta-learning can help improve (if it can) cross-task generalization.
arXiv Detail & Related papers (2023-02-16T08:37:22Z) - SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning [28.29889045842277]
Multitask prompted learning can help generalization through a diverse set of tasks at once.
We propose SPT, a semi-parametric prompt tuning method for multitask prompted learning.
arXiv Detail & Related papers (2022-12-21T11:18:09Z) - SEPT: Towards Scalable and Efficient Visual Pre-Training [11.345844145289524]
Self-supervised pre-training has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance.
We build a task-specific self-supervised pre-training framework based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains.
arXiv Detail & Related papers (2022-12-11T11:02:11Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning.
We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees.
As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.