Dynamic Task Vector Grouping for Efficient Multi-Task Prompt Tuning
- URL: http://arxiv.org/abs/2503.18063v1
- Date: Sun, 23 Mar 2025 13:09:04 GMT
- Title: Dynamic Task Vector Grouping for Efficient Multi-Task Prompt Tuning
- Authors: Pieyi Zhang, Richong Zhang, Zhijie Nie,
- Abstract summary: Multi-task prompt tuning utilizes multiple high-resource source tasks to improve performance on low-source target tasks.<n>Existing approaches transfer the soft prompt trained by combining all source tasks or a single high-similar'' source task one-time-only iteration.<n>We find that the optimal transfer performance often comes from a combination of source tasks, which is neither one nor all.
- Score: 20.37803751979975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task prompt tuning utilizes multiple high-resource source tasks to improve performance on low-source target tasks. Existing approaches transfer the soft prompt trained by combining all source tasks or a single ``high-similar'' source task one-time-only. However, we find that the optimal transfer performance often comes from a combination of source tasks, which is neither one nor all. Further, we find that the similarity between source and target tasks also changes dynamically during fine-tuning after transfering, making similarity calculation in the initiation stage inadequate. To address these issues, we propose a method called Dynamic Task Vector Grouping (DTVG), whose core ideas contain (1) measuring the task similarity with task vectors instead of soft prompt, (2) grouping the optimal source task combination based on two metrics: {\it target similarity} and {\it knowledge consistency}; (3) dynamically updating the combination in each iteration step. Extensive experiments on the 26 NLP datasets under different settings demonstrate that DTVG effectively groups similar source tasks while reducing negative transfer, achieving the start-of-art performance.
Related papers
- Multi-Task Model Merging via Adaptive Weight Disentanglement [69.7292615212444]
We introduce an Adaptive Weight Disentanglement method for model merging.<n>We successfully extract redundant vectors, and after their subtraction, the task vectors retain robust performance.
arXiv Detail & Related papers (2024-11-27T20:08:55Z) - Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer [0.6053347262128919]
We introduce Task Prompt Vectors, created by element-wise difference between weights of tuned soft-prompts and their random initialization.
We show that task prompt vectors can be used in low-resource settings to effectively initialize prompt tuning on similar tasks.
This allows prompt arithmetics with the pre-trained vectors from different tasks.
arXiv Detail & Related papers (2024-08-02T09:00:03Z) - DMTG: One-Shot Differentiable Multi-Task Grouping [32.72240053032646]
We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG)
We propose to simultaneously identify the best task groups from 2N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited.
arXiv Detail & Related papers (2024-07-06T13:54:00Z) - Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Identification of Negative Transfers in Multitask Learning Using
Surrogate Models [29.882265735630046]
Multitask learning is widely used to train a low-resource target task by augmenting it with multiple related source tasks.
A critical problem in multitask learning is identifying subsets of source tasks that would benefit the target task.
We introduce an efficient procedure to address this problem via surrogate modeling.
arXiv Detail & Related papers (2023-03-25T23:16:11Z) - Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning [43.639430661322585]
We propose multitask prompt tuning (MPT)
MPT learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts.
We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task.
arXiv Detail & Related papers (2023-03-06T03:25:59Z) - ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning [59.08197876733052]
Auxiliary-Task Learning (ATL) aims to improve the performance of the target task by leveraging the knowledge obtained from related tasks.
Sometimes, learning multiple tasks simultaneously results in lower accuracy than learning only the target task, known as negative transfer.
ForkMerge is a novel approach that periodically forks the model into multiple branches, automatically searches the varying task weights.
arXiv Detail & Related papers (2023-01-30T02:27:02Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - Distributed Primal-Dual Optimization for Online Multi-Task Learning [22.45069527817333]
We propose an adaptive primal-dual algorithm, which captures task-specific noise in adversarial learning and carries out a projection-free update with runtime efficiency.
Our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update.
Empirical results confirm that the proposed model is highly effective on various real-world datasets.
arXiv Detail & Related papers (2020-04-02T23:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.