Task Adaptive Parameter Sharing for Multi-Task Learning
- URL: http://arxiv.org/abs/2203.16708v1
- Date: Wed, 30 Mar 2022 23:16:07 GMT
- Title: Task Adaptive Parameter Sharing for Multi-Task Learning
- Authors: Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran,
Charless Fowlkes, Rahul Bhotika, Stefano Soatto
- Abstract summary: Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
- Score: 114.80350786535952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adapting pre-trained models with broad capabilities has become standard
practice for learning a wide range of downstream tasks. The typical approach of
fine-tuning different models for each task is performant, but incurs a
substantial memory cost. To efficiently learn multiple downstream tasks we
introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a
base model to a new task by adaptively modifying a small, task-specific subset
of layers. This enables multi-task learning while minimizing resources used and
competition between tasks. TAPS solves a joint optimization problem which
determines which layers to share with the base model and the value of the
task-specific weights. Further, a sparsity penalty on the number of active
layers encourages weight sharing with the base model. Compared to other
methods, TAPS retains high accuracy on downstream tasks while introducing few
task-specific parameters. Moreover, TAPS is agnostic to the model architecture
and requires only minor changes to the training scheme. We evaluate our method
on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and
show that it achieves state-of-the-art performance while being simple to
implement.
Related papers
- LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Multi-Objective Optimization for Sparse Deep Multi-Task Learning [0.0]
We present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs)
Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with particular focus on Deep Multi-Task models.
arXiv Detail & Related papers (2023-08-23T16:42:27Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Instance-Level Task Parameters: A Robust Multi-task Weighting Framework [17.639472693362926]
Recent works have shown that deep neural networks benefit from multi-task learning by learning a shared representation across several related tasks.
We let the training process dictate the optimal weighting of tasks for every instance in the dataset.
We conduct extensive experiments on SURREAL and CityScapes datasets, for human shape and pose estimation, depth estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-11T02:35:42Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z) - Knowledge Distillation for Multi-task Learning [38.20005345733544]
Multi-task learning (MTL) is to learn one single model that performs multiple tasks for achieving good performance on all tasks and lower cost on computation.
Learning such a model requires to jointly optimize losses of a set of tasks with different difficulty levels, magnitudes, and characteristics.
We propose a knowledge distillation based method in this work to address the imbalance problem in multi-task learning.
arXiv Detail & Related papers (2020-07-14T08:02:42Z) - HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable
Hyper Projections [96.64246471034195]
We propose textscHyperGrid, a new approach for highly effective multi-task learning.
Our method helps bridge the gap between fine-tuning and multi-task learning approaches.
arXiv Detail & Related papers (2020-07-12T02:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.