Rethinking Hard-Parameter Sharing in Multi-Task Learning
- URL: http://arxiv.org/abs/2107.11359v1
- Date: Fri, 23 Jul 2021 17:26:40 GMT
- Title: Rethinking Hard-Parameter Sharing in Multi-Task Learning
- Authors: Lijun Zhang, Qizheng Yang, Xiao Liu, Hui Guan
- Abstract summary: Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
- Score: 20.792654758645302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hard parameter sharing in multi-task learning (MTL) allows tasks to share
some of model parameters, reducing storage cost and improving prediction
accuracy. The common sharing practice is to share bottom layers of a deep
neural network among tasks while using separate top layers for each task. In
this work, we revisit this common practice via an empirical study on
fine-grained image classification tasks and make two surprising observations.
(1) Using separate bottom-layer parameters could achieve significantly better
performance than the common practice and this phenomenon holds for different
number of tasks jointly trained on different backbone architectures with
different quantity of task-specific parameters. (2) A multi-task model with a
small proportion of task-specific parameters from bottom layers can achieve
competitive performance with independent models trained on each task separately
and outperform a state-of-the-art MTL framework. Our observations suggest that
people rethink the current sharing paradigm and adopt the new strategy of using
separate bottom-layer parameters as a stronger baseline for model design in
MTL.
Related papers
- Pilot: Building the Federated Multimodal Instruction Tuning Framework [79.56362403673354]
Our framework integrates two stages of "adapter on adapter" into the connector of the vision encoder and the LLM.
In stage 1, we extract task-specific features and client-specific features from visual information.
In stage 2, we build the cross-task Mixture-of-Adapters(CT-MoA) module to perform cross-task interaction.
arXiv Detail & Related papers (2025-01-23T07:49:24Z) - Tint Your Models Task-wise for Improved Multi-task Model Merging [17.496018757317824]
We propose Model Tinting, a test-time approach that introduces a single task-specific layer for each task as trainable adjustments.
Our method jointly trains merging coefficients and task-specific layers, which effectively reduces task conflicts with minimal additional costs.
Our method achieves state-of-the-art performance across both computer vision and natural language processing tasks.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-Task Large Language Models [0.0]
This paper introduces Bayesian Hierarchical Low-Rank Adaption (BoRA), a novel method for finetuning multi-task Large Language Models (LLMs)
BoRA addresses trade-offs by leveraging a Bayesian hierarchical model that allows tasks to share information through global hierarchical priors.
Our experimental results show that BoRA outperforms both individual and unified model approaches, achieving lower perplexity and better generalization across tasks.
arXiv Detail & Related papers (2024-07-08T06:38:50Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.