TransTailor: Pruning the Pre-trained Model for Improved Transfer
Learning
- URL: http://arxiv.org/abs/2103.01542v1
- Date: Tue, 2 Mar 2021 07:58:35 GMT
- Title: TransTailor: Pruning the Pre-trained Model for Improved Transfer
Learning
- Authors: Bingyan Liu, Yifeng Cai, Yao Guo, Xiangqun Chen
- Abstract summary: We propose TransTailor, targeting at pruning the pre-trained model for improved transfer learning.
We prune and fine-tune the pre-trained model according to the target-aware weight importance.
We transfer a more suitable sub-structure that can be applied during fine-tuning to benefit the final performance.
- Score: 5.9292619981667976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing of pre-trained models has significantly facilitated the
performance on limited data tasks with transfer learning. However, progress on
transfer learning mainly focuses on optimizing the weights of pre-trained
models, which ignores the structure mismatch between the model and the target
task. This paper aims to improve the transfer performance from another angle -
in addition to tuning the weights, we tune the structure of pre-trained models,
in order to better match the target task. To this end, we propose TransTailor,
targeting at pruning the pre-trained model for improved transfer learning.
Different from traditional pruning pipelines, we prune and fine-tune the
pre-trained model according to the target-aware weight importance, generating
an optimal sub-model tailored for a specific target task. In this way, we
transfer a more suitable sub-structure that can be applied during fine-tuning
to benefit the final performance. Extensive experiments on multiple pre-trained
models and datasets demonstrate that TransTailor outperforms the traditional
pruning methods and achieves competitive or even better performance than other
state-of-the-art transfer learning methods while using a smaller model.
Notably, on the Stanford Dogs dataset, TransTailor can achieve 2.7% accuracy
improvement over other transfer methods with 20% fewer FLOPs.
Related papers
- Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters.
Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks.
Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z) - Transferring Knowledge from Large Foundation Models to Small Downstream Models [40.38657103236168]
We introduce Adaptive Feature Transfer (AFT) to transfer knowledge between pre-trained models.
AFT operates purely on features, decoupling the choice of the pre-trained model from the smaller downstream model.
AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost.
arXiv Detail & Related papers (2024-06-11T15:06:15Z) - Efficient Transferability Assessment for Selection of Pre-trained Detectors [63.21514888618542]
This paper studies the efficient transferability assessment of pre-trained object detectors.
We build up a detector transferability benchmark which contains a large and diverse zoo of pre-trained detectors.
Experimental results demonstrate that our method outperforms other state-of-the-art approaches in assessing transferability.
arXiv Detail & Related papers (2024-03-14T14:23:23Z) - Fast and Accurate Transferability Measurement by Evaluating Intra-class
Feature Variance [20.732095457775138]
Transferability measurement is to quantify how transferable is a pre-trained model learned on a source task to a target task.
We propose TMI (TRANSFERABILITY MEASUREMENT WITH INTRA-CLASS FEATURE VARIANCE), a fast and accurate algorithm to measure transferability.
arXiv Detail & Related papers (2023-08-11T07:50:40Z) - Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance.
Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Knowledge Distillation as Efficient Pre-training: Faster Convergence,
Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks.
Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z) - CARTL: Cooperative Adversarially-Robust Transfer Learning [22.943270371841226]
In deep learning, a typical strategy for transfer learning is to freeze the early layers of a pre-trained model and fine-tune the rest of its layers on the target domain.
We propose a novel cooperative adversarially-robust transfer learning (CARTL) by pre-training the model via feature distance minimization and fine-tuning the pre-trained model with non-expansive fine-tuning for target domain tasks.
arXiv Detail & Related papers (2021-06-12T02:29:55Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning.
Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z) - Renofeation: A Simple Transfer Learning Method for Improved Adversarial
Robustness [26.73248223512572]
A recent adversarial attack can successfully deceive models trained with transfer learning via end-to-end fine-tuning.
This raises security concerns for many industrial applications.
We propose noisy feature distillation, a new transfer learning method.
arXiv Detail & Related papers (2020-02-07T20:07:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.