Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models
- URL: http://arxiv.org/abs/2307.15049v2
- Date: Sun, 6 Aug 2023 14:05:38 GMT
- Title: Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained
Vision-Language Models
- Authors: Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao,
Zheng-Jun Zha, Wei Chen, Yujun Shen
- Abstract summary: We design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection.
Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage.
It is noteworthy that we manage to deliver 18.73% performance improvement compared to the zero-shot CLIP via masking an average of only 2.56% parameters.
- Score: 89.07925369856139
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt tuning and adapter tuning have shown great potential in transferring
pre-trained vision-language models (VLMs) to various downstream tasks. In this
work, we design a new type of tuning method, termed as regularized mask tuning,
which masks the network parameters through a learnable selection. Inspired by
neural pathways, we argue that the knowledge required by a downstream task
already exists in the pre-trained weights but just gets concealed in the
upstream pre-training stage. To bring the useful knowledge back into light, we
first identify a set of parameters that are important to a given downstream
task, then attach a binary mask to each parameter, and finally optimize these
masks on the downstream data with the parameters frozen. When updating the
mask, we introduce a novel gradient dropout strategy to regularize the
parameter selection, in order to prevent the model from forgetting old
knowledge and overfitting the downstream data. Experimental results on 11
datasets demonstrate the consistent superiority of our method over previous
alternatives. It is noteworthy that we manage to deliver 18.73% performance
improvement compared to the zero-shot CLIP via masking an average of only 2.56%
parameters. Furthermore, our method is synergistic with most existing
parameter-efficient tuning methods and can boost the performance on top of
them. Project page can be found here (https://wuw2019.github.io/R-AMT/).
Related papers
- Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained
Models for Spatiotemporal Modeling [32.603558214472265]
We introduce Attention Prompt Tuning (APT) for video-based applications such as action recognition.
APT involves injecting a set of learnable prompts along with data tokens during fine-tuning while keeping the backbone frozen.
The proposed approach greatly reduces the number of FLOPs and latency while achieving a significant performance boost.
arXiv Detail & Related papers (2024-03-11T17:59:41Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained
Language Models For Classification Tasks [49.807185872741066]
We propose a simple tuning method which only introduces three trainable vectors.
We input the integrated hidden state(s) to a task-specific linear classifier to predict categories.
This scheme is similar to the way ELMo utilises hidden states except that they feed the hidden states to LSTM-based models.
arXiv Detail & Related papers (2022-04-10T04:14:02Z) - Training Neural Networks with Fixed Sparse Masks [19.58969772430058]
Recent work has shown that it is possible to update only a small subset of the model's parameters during training.
We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
arXiv Detail & Related papers (2021-11-18T18:06:01Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.