One-Shot Pruning for Fast-adapting Pre-trained Models on Devices
- URL: http://arxiv.org/abs/2307.04365v1
- Date: Mon, 10 Jul 2023 06:44:47 GMT
- Title: One-Shot Pruning for Fast-adapting Pre-trained Models on Devices
- Authors: Haiyan Zhao, Guodong Long
- Abstract summary: Large-scale pre-trained models have been remarkably successful in resolving downstream tasks.
deploying these models on low-capability devices still requires an effective approach, such as model pruning.
We present a scalable one-shot pruning method that leverages pruned knowledge of similar tasks to extract a sub-network from the pre-trained model for a new task.
- Score: 28.696989086706186
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale pre-trained models have been remarkably successful in resolving
downstream tasks. Nonetheless, deploying these models on low-capability devices
still requires an effective approach, such as model pruning. However, pruning
the model from scratch can pose a practical challenge given the limited
resources of each downstream task or device. To tackle this issue, we present a
scalable one-shot pruning method that leverages pruned knowledge of similar
tasks to extract a sub-network from the pre-trained model for a new task.
Specifically, we create a score mask using the pruned models of similar tasks
to identify task-specific filters/nodes in the pre-trained model for the new
task. Based on this mask, we conduct a single round of pruning to extract a
suitably-sized sub-network that can quickly adapt to the new task with only a
few training iterations. Our experimental analysis demonstrates the
effectiveness of the proposed method on the convolutional neural networks
(CNNs) and vision transformers (ViT) with various datasets. The proposed method
consistently outperforms popular pruning baseline methods in terms of accuracy
and efficiency when dealing with diverse downstream tasks with different memory
constraints.
Related papers
- Exploring Transferability for Randomized Smoothing [37.60675615521106]
We propose a method for pretraining certifiably robust models.
We find that surprisingly strong certified accuracy can be achieved even when finetuning on only clean images.
arXiv Detail & Related papers (2023-12-14T15:08:27Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - DiSparse: Disentangled Sparsification for Multitask Model Compression [92.84435347164435]
DiSparse is a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme.
Our experimental results demonstrate superior performance on various configurations and settings.
arXiv Detail & Related papers (2022-06-09T17:57:46Z) - Task Adaptive Parameter Sharing for Multi-Task Learning [114.80350786535952]
Adaptive Task Adapting Sharing (TAPS) is a method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers.
Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters.
We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
arXiv Detail & Related papers (2022-03-30T23:16:07Z) - Reconstruction Task Finds Universal Winning Tickets [24.52604301906691]
Pruning well-trained neural networks is effective to achieve a promising accuracy-efficiency trade-off in computer vision regimes.
Most of existing pruning algorithms only focus on the classification task defined on the source domain.
In this paper, we show that the image-level pretrain task is not capable of pruning models for diverse downstream tasks.
arXiv Detail & Related papers (2022-02-23T13:04:32Z) - Mixed-Privacy Forgetting in Deep Networks [114.3840147070712]
We show that the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks.
Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting.
We show that our method allows forgetting without having to trade off the model accuracy.
arXiv Detail & Related papers (2020-12-24T19:34:56Z) - Lifelong Learning Without a Task Oracle [13.331659934508764]
Supervised deep neural networks are known to undergo a sharp decline in the accuracy of older tasks when new tasks are learned.
We propose and compare several candidate task-assigning mappers which require very little memory overhead.
Best-performing variants only impose an average cost of 1.7% parameter memory increase.
arXiv Detail & Related papers (2020-11-09T21:30:31Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.