Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks
- URL: http://arxiv.org/abs/2301.11560v1
- Date: Fri, 27 Jan 2023 06:49:47 GMT
- Title: Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks
- Authors: Haiyan Zhao, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang
- Abstract summary: We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
- Score: 55.431048995662714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a few large-scale pre-trained models become the major choices of various
applications, new challenges arise for model pruning, e.g., can we avoid
pruning the same model from scratch for every downstream task? How to reuse the
pruning results of previous tasks to accelerate the pruning for a new task? To
address these challenges, we create a small model for a new task from the
pruned models of similar tasks. We show that a few fine-tuning steps on this
model suffice to produce a promising pruned-model for the new task. We study
this ''meta-pruning'' from nearest tasks on two major classes of pre-trained
models, convolutional neural network (CNN) and vision transformer (ViT), under
a limited budget of pruning iterations. Our study begins by investigating the
overlap of pruned models for similar tasks and how the overlap changes over
different layers and blocks. Inspired by these discoveries, we develop a simple
but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the
pruning iterations for a new task by initializing a sub-network from the pruned
models of its nearest tasks. In experiments, we demonstrate MVP's advantages in
accuracy, efficiency, and generalization through extensive empirical studies
and comparisons with popular pruning methods over several datasets.
Related papers
- MOWA: Multiple-in-One Image Warping Model [65.73060159073644]
We propose a Multiple-in-One image warping model (named MOWA) in this work.
We mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level.
To our knowledge, this is the first work that solves multiple practical warping tasks in one single model.
arXiv Detail & Related papers (2024-04-16T16:50:35Z) - One-Shot Pruning for Fast-adapting Pre-trained Models on Devices [28.696989086706186]
Large-scale pre-trained models have been remarkably successful in resolving downstream tasks.
deploying these models on low-capability devices still requires an effective approach, such as model pruning.
We present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task.
arXiv Detail & Related papers (2023-07-10T06:44:47Z) - Task-Robust Pre-Training for Worst-Case Downstream Adaptation [62.05108162160981]
Pre-training has achieved remarkable success when transferred to downstream tasks.
This paper considers pre-training a model that guarantees a uniformly good performance over the downstream tasks.
arXiv Detail & Related papers (2023-06-21T07:43:23Z) - Structured Pruning for Multi-Task Deep Neural Networks [25.916166808223743]
Multi-task deep neural network (DNN) models have computation and storage benefits over individual single-task models.
We investigate the effectiveness of structured pruning on multi-task models.
arXiv Detail & Related papers (2023-04-13T22:15:47Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Pruning Pretrained Encoders with a Multitask Objective [12.062758391661847]
We compare pruning a single model with a multitask objective against the best ensemble of single-task models.
Additional analysis finds that using a multitask objective during pruning can also be an effective method for reducing model sizes for low-resource tasks.
arXiv Detail & Related papers (2021-12-10T17:57:33Z) - Few Is Enough: Task-Augmented Active Meta-Learning for Brain Cell
Classification [8.998976678920236]
We propose a tAsk-auGmented actIve meta-LEarning (AGILE) method to efficiently adapt Deep Neural Networks to new tasks.
AGILE combines a meta-learning algorithm with a novel task augmentation technique which we use to generate an initial adaptive model.
We show that the proposed task-augmented meta-learning framework can learn to classify new cell types after a single gradient step.
arXiv Detail & Related papers (2020-07-09T18:03:12Z) - On the Effect of Dropping Layers of Pre-trained Transformer Models [35.25025837133909]
We explore strategies to drop layers in pre-trained models, and observe the effect of pruning on downstream GLUE tasks.
We were able to prune BERT, RoBERTa and XLNet models up to 40%, while maintaining up to 98% of their original performance.
Our experiments yield interesting observations such as, (i) the lower layers are most critical to maintain downstream task performance, (ii) some tasks such as paraphrase detection and sentence similarity are more robust to the dropping of layers, and (iii) models trained using a different objective function exhibit different learning patterns and w.r.t the layer dropping
arXiv Detail & Related papers (2020-04-08T07:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.