VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity
Control
- URL: http://arxiv.org/abs/2308.09804v1
- Date: Fri, 18 Aug 2023 20:18:30 GMT
- Title: VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity
Control
- Authors: Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang
- Abstract summary: In vision-and-language (VL), parameter-efficient tuning (PET) techniques are proposed to integrate modular modifications into encoder-decoder PLMs.
We propose a Vision-and-Language.
Efficient Tuning (VL-PET) framework to impose effective control over modular modifications.
- Score: 44.73827206809393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the model size of pre-trained language models (PLMs) grows rapidly, full
fine-tuning becomes prohibitively expensive for model training and storage. In
vision-and-language (VL), parameter-efficient tuning (PET) techniques are
proposed to integrate modular modifications (e.g., Adapter and LoRA) into
encoder-decoder PLMs. By tuning a small set of trainable parameters, these
techniques perform on par with full fine-tuning. However, excessive modular
modifications and neglecting the functionality gap between the encoders and
decoders can lead to performance degradation, while existing PET techniques
(e.g., VL-Adapter) overlook these critical issues. In this paper, we propose a
Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose
effective control over modular modifications via a novel granularity-controlled
mechanism. Considering different granularity-controlled matrices generated by
this mechanism, a variety of model-agnostic VL-PET modules can be instantiated
from our framework for better efficiency and effectiveness trade-offs. We
further propose lightweight PET module designs to enhance VL alignment and
modeling for the encoders and maintain text generation for the decoders.
Extensive experiments conducted on four image-text tasks and four video-text
tasks demonstrate the efficiency, effectiveness and transferability of our
VL-PET framework. In particular, our VL-PET-large with lightweight PET module
designs significantly outperforms VL-Adapter by 2.92% (3.41%) and LoRA by 3.37%
(7.03%) with BART-base (T5-base) on image-text tasks. Furthermore, we validate
the enhanced effect of employing our VL-PET designs on existing PET techniques,
enabling them to achieve significant performance improvements. Our code is
available at https://github.com/HenryHZY/VL-PET.
Related papers
- ConPET: Continual Parameter-Efficient Tuning for Large Language Models [65.48107393731861]
Continual learning requires continual adaptation of models to newly emerging tasks.
We propose Continual.
Efficient Tuning (ConPET), a generalizable paradigm for.
continual task adaptation of large language models.
arXiv Detail & Related papers (2023-09-26T08:52:04Z) - Parameter and Computation Efficient Transfer Learning for
Vision-Language Pre-trained Models [79.34513906324727]
In this paper, we aim at parameter and efficient transfer learning (PCETL) for vision-language pre-trained models.
We propose a novel dynamic architecture skipping (DAS) approach towards effective PCETL.
arXiv Detail & Related papers (2023-09-04T09:34:33Z) - Exploring the Impact of Model Scaling on Parameter-Efficient Tuning [100.61202305296275]
Scaling-efficient tuning (PET) methods can effectively drive extremely large pre-trained language models (PLMs)
In small PLMs, there are usually noticeable performance differences among PET methods.
We introduce a more flexible PET method called Arbitrary PET (APET) method.
arXiv Detail & Related papers (2023-06-04T10:10:54Z) - A Unified Continual Learning Framework with General Parameter-Efficient
Tuning [56.250772378174446]
"Pre-training $rightarrow$ downstream adaptation" presents both new opportunities and challenges for Continual Learning.
We position prompting as one instantiation of PET, and propose a unified CL framework, dubbed as Learning-Accumulation-Ensemble (LAE)
PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources.
arXiv Detail & Related papers (2023-03-17T15:52:45Z) - Towards a Unified View on Visual Parameter-Efficient Transfer Learning [96.99924127527002]
We propose a framework with a unified view called visual-PETL (V-PETL) to investigate the different aspects affecting the trade-off.
An effective scheme Swin-BAPAT derived from the proposed V-PETL framework achieves significantly better performance than the state-of-the-art AdaptFormer-Swin.
arXiv Detail & Related papers (2022-10-03T09:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.