HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both
Language and Vision-and-Language Tasks
- URL: http://arxiv.org/abs/2203.03878v1
- Date: Tue, 8 Mar 2022 06:51:33 GMT
- Title: HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both
Language and Vision-and-Language Tasks
- Authors: Zhengkun Zhang, Wenya Guo, Xiaojun Meng, Yasheng Wang, Yadao Wang, Xin
Jiang, Qun Liu, Zhenglu Yang
- Abstract summary: How to perform parameter-efficient fine-tuning has become fairly important for quick transfer learning and deployment.
We design a novel unified parameter-efficient transfer learning framework that works effectively on both pure language and V&L tasks.
Our proposed framework adds fewer trainable parameters in multi-task learning while achieving superior performances and transfer ability compared to state-of-the-art methods.
- Score: 38.43269863509866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The workflow of pretraining and fine-tuning has emerged as a popular paradigm
for solving various NLP and V&L (Vision-and-Language) downstream tasks. With
the capacity of pretrained models growing rapidly, how to perform
parameter-efficient fine-tuning has become fairly important for quick transfer
learning and deployment. In this paper, we design a novel unified
parameter-efficient transfer learning framework that works effectively on both
pure language and V&L tasks. In particular, we use a shared hypernetwork that
takes trainable hyper-embeddings as input, and outputs weights for fine-tuning
different small modules in a pretrained language model, such as tuning the
parameters inserted into multi-head attention blocks (i.e., prefix-tuning) and
feed-forward blocks (i.e., adapter-tuning). We define a set of embeddings
(e.g., layer, block, task and visual embeddings) as the key components to
calculate hyper-embeddings, which thus can support both pure language and V&L
tasks. Our proposed framework adds fewer trainable parameters in multi-task
learning while achieving superior performances and transfer ability compared to
state-of-the-art methods. Empirical results on the GLUE benchmark and multiple
V&L tasks confirm the effectiveness of our framework on both textual and visual
modalities.
Related papers
- Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning [59.001091197106085]
Multi-Task Learning (MTL) for Vision Transformer aims at enhancing the model capability by tackling multiple tasks simultaneously.
Most recent works have predominantly focused on designing Mixture-of-Experts (MoE) structures and in tegrating Low-Rank Adaptation (LoRA) to efficiently perform multi-task learning.
We propose a novel approach dubbed Efficient Multi-Task Learning (EMTAL) by transforming a pre-trained Vision Transformer into an efficient multi-task learner.
arXiv Detail & Related papers (2025-01-12T17:41:23Z) - Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment [58.94611347128066]
Task Preference Optimization (TPO) is a novel method that utilizes differentiable task preferences derived from typical fine-grained visual tasks.
By leveraging rich visual labels during training, TPO significantly enhances the MLLM's multimodal capabilities and task-specific performance.
Our instantiation of this approach with VideoChat and LLaVA demonstrates an overall 14.6% improvement in multimodal performance compared to baseline models.
arXiv Detail & Related papers (2024-12-26T18:56:05Z) - ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning [38.26304604660713]
ADEM-VL is an efficient vision-language method that tunes models based on pretrained large language models.
Our framework surpasses existing methods by an average accuracy of 0.77% on ScienceQA dataset.
arXiv Detail & Related papers (2024-10-23T11:31:06Z) - Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model.
We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z) - Making Small Language Models Better Multi-task Learners with
Mixture-of-Task-Adapters [13.6682552098234]
Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks.
We present ALTER, a system that effectively builds the multi-tAsk learners with mixTure-of-task-adaptERs upon small language models.
A two-stage training method is proposed to optimize the collaboration between adapters at a small computational cost.
arXiv Detail & Related papers (2023-09-20T03:39:56Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Few-shot Multimodal Multitask Multilingual Learning [0.0]
We propose few-shot learning for a multimodal multitask multilingual (FM3) setting by adapting pre-trained vision and language models.
FM3 learns the most prominent tasks in the vision and language domains along with their intersections.
arXiv Detail & Related papers (2023-02-19T03:48:46Z) - Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision
Tasks [36.34331439747556]
We propose Polyhistor and Polyhistor-Lite to share information across different tasks with a few trainable parameters.
Specifically, Polyhistor achieves competitive accuracy compared to the state-of-the-art while only using 10% of their trainable parameters.
arXiv Detail & Related papers (2022-10-07T00:25:02Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.