Rethinking Hard-Parameter Sharing in Multi-Task Learning
- URL: http://arxiv.org/abs/2107.11359v1
- Date: Fri, 23 Jul 2021 17:26:40 GMT
- Title: Rethinking Hard-Parameter Sharing in Multi-Task Learning
- Authors: Lijun Zhang, Qizheng Yang, Xiao Liu, Hui Guan
- Abstract summary: Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
- Score: 20.792654758645302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hard parameter sharing in multi-task learning (MTL) allows tasks to share
some of model parameters, reducing storage cost and improving prediction
accuracy. The common sharing practice is to share bottom layers of a deep
neural network among tasks while using separate top layers for each task. In
this work, we revisit this common practice via an empirical study on
fine-grained image classification tasks and make two surprising observations.
(1) Using separate bottom-layer parameters could achieve significantly better
performance than the common practice and this phenomenon holds for different
number of tasks jointly trained on different backbone architectures with
different quantity of task-specific parameters. (2) A multi-task model with a
small proportion of task-specific parameters from bottom layers can achieve
competitive performance with independent models trained on each task separately
and outperform a state-of-the-art MTL framework. Our observations suggest that
people rethink the current sharing paradigm and adopt the new strategy of using
separate bottom-layer parameters as a stronger baseline for model design in
MTL.
Related papers
- BoRA: Bayesian Hierarchical Low-Rank Adaption for Multi-task Large Language Models [0.0]
This paper introduces Bayesian Hierarchical Low-Rank Adaption (BoRA), a novel method for finetuning multi-task Large Language Models (LLMs)
BoRA addresses trade-offs by leveraging a Bayesian hierarchical model that allows tasks to share information through global hierarchical priors.
Our experimental results show that BoRA outperforms both individual and unified model approaches, achieving lower perplexity and better generalization across tasks.
arXiv Detail & Related papers (2024-07-08T06:38:50Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - DEPHN: Different Expression Parallel Heterogeneous Network using virtual
gradient optimization for Multi-task Learning [1.0705399532413615]
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors.
Traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.
We propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously.
arXiv Detail & Related papers (2023-07-24T04:29:00Z) - Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners [74.92558307689265]
We propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad')
We optimize this matching process during the training of a single model.
Experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
arXiv Detail & Related papers (2022-12-15T18:59:52Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.