DiSparse: Disentangled Sparsification for Multitask Model Compression
- URL: http://arxiv.org/abs/2206.04662v1
- Date: Thu, 9 Jun 2022 17:57:46 GMT
- Title: DiSparse: Disentangled Sparsification for Multitask Model Compression
- Authors: Xinglong Sun, Ali Hassani, Zhangyang Wang, Gao Huang, Humphrey Shi
- Abstract summary: DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
- Score: 92.84435347164435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the popularity of Model Compression and Multitask Learning, how to
effectively compress a multitask model has been less thoroughly analyzed due to
the challenging entanglement of tasks in the parameter space. In this paper, we
propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning
and sparse training scheme. We consider each task independently by
disentangling the importance measurement and take the unanimous decisions among
all tasks when performing parameter pruning and selection. Our experimental
results demonstrate superior performance on various configurations and settings
compared to popular sparse training and pruning methods. Besides the
effectiveness in compression, DiSparse also provides a powerful tool to the
multitask learning community. Surprisingly, we even observed better performance
than some dedicated multitask learning methods in several cases despite the
high model sparsity enforced by DiSparse. We analyzed the pruning masks
generated with DiSparse and observed strikingly similar sparse network
architecture identified by each task even before the training starts. We also
observe the existence of a "watershed" layer where the task relatedness sharply
drops, implying no benefits in continued parameters sharing. Our code and
models will be available at:
https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression.
Related papers
- Localizing Task Information for Improved Model Merging and Compression [61.16012721460561]
We show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights.
We propose Consensus Merging, an algorithm that eliminates such weights and improves the general performance of existing model merging approaches.
arXiv Detail & Related papers (2024-05-13T14:54:37Z) - Cross-Task Affinity Learning for Multitask Dense Scene Predictions [5.939164722752263]
Multitask learning (MTL) has become prominent for its ability to predict multiple tasks jointly.
We introduce the Cross-Task Affinity Learning (CTAL) module, a lightweight framework that enhances task refinement in multitask networks.
Our results demonstrate state-of-the-art MTL performance for both CNN and transformer backbones, using significantly fewer parameters than single-task learning.
arXiv Detail & Related papers (2024-01-20T05:31:47Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners [67.5865966762559]
We study whether sparsely activated Mixture-of-Experts (MoE) improve multi-task learning.
We devise task-aware gating functions to route examples from different tasks to specialized experts.
This results in a sparsely activated multi-task model with a large number of parameters, but with the same computational cost as that of a dense model.
arXiv Detail & Related papers (2022-04-16T00:56:12Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Pruning Pretrained Encoders with a Multitask Objective [12.062758391661847]
We compare pruning a single model with a multitask objective against the best ensemble of single-task models.
Additional analysis finds that using a multitask objective during pruning can also be an effective method for reducing model sizes for low-resource tasks.
arXiv Detail & Related papers (2021-12-10T17:57:33Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Parameter-Efficient Transfer Learning with Diff Pruning [108.03864629388404]
diff pruning is a simple approach to enable parameter-efficient transfer learning within the pretrain-finetune framework.
We find that models finetuned with diff pruning can match the performance of fully finetuned baselines on the GLUE benchmark.
arXiv Detail & Related papers (2020-12-14T12:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.