Less is More -- Towards parsimonious multi-task models using structured
sparsity
- URL: http://arxiv.org/abs/2308.12114v3
- Date: Thu, 30 Nov 2023 15:26:54 GMT
- Title: Less is More -- Towards parsimonious multi-task models using structured
sparsity
- Authors: Richa Upadhyay, Ronald Phlypo, Rajkumar Saini, Marcus Liwicki
- Abstract summary: This work focuses on creating sparse models optimized for multiple tasks with fewer parameters.
We introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model.
We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets.
- Score: 4.874780144224057
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model sparsification in deep learning promotes simpler, more interpretable
models with fewer parameters. This not only reduces the model's memory
footprint and computational needs but also shortens inference time. This work
focuses on creating sparse models optimized for multiple tasks with fewer
parameters. These parsimonious models also possess the potential to match or
outperform dense models in terms of performance. In this work, we introduce
channel-wise l1/l2 group sparsity in the shared convolutional layers parameters
(or weights) of the multi-task learning model. This approach facilitates the
removal of extraneous groups i.e., channels (due to l1 regularization) and also
imposes a penalty on the weights, further enhancing the learning efficiency for
all tasks (due to l2 regularization). We analyzed the results of group sparsity
in both single-task and multi-task settings on two widely-used Multi-Task
Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which
consist of three different computer vision tasks each, multi-task models with
approximately 70% sparsity outperform their dense equivalents. We also
investigate how changing the degree of sparsification influences the model's
performance, the overall sparsity percentage, the patterns of sparsity, and the
inference time.
Related papers
- On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - Adaptive Weight Assignment Scheme For Multi-task Learning [0.0]
Deep learning models are used regularly in every applications nowadays.
We can train multiple tasks on a single model under multi-task learning settings.
To train a model in multi-task learning settings we need to sum the loss values from different tasks.
In this paper we propose a simple weight assignment scheme which improves the performance of the model.
arXiv Detail & Related papers (2023-03-10T08:06:08Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Rethinking Hard-Parameter Sharing in Multi-Task Learning [20.792654758645302]
Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy.
The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task.
Using separate bottom-layer parameters could achieve significantly better performance than the common practice.
arXiv Detail & Related papers (2021-07-23T17:26:40Z) - Reparameterizing Convolutions for Incremental Multi-Task Learning
without Task Interference [75.95287293847697]
Two common challenges in developing multi-task models are often overlooked in literature.
First, enabling the model to be inherently incremental, continuously incorporating information from new tasks without forgetting the previously learned ones (incremental learning)
Second, eliminating adverse interactions amongst tasks, which has been shown to significantly degrade the single-task performance in a multi-task setup (task interference)
arXiv Detail & Related papers (2020-07-24T14:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.