Related papers: Differentiable Prompt Learning for Vision Language Models

Differentiable Prompt Learning for Vision Language Models

URL: http://arxiv.org/abs/2501.00457v1
Date: Tue, 31 Dec 2024 14:13:28 GMT
Title: Differentiable Prompt Learning for Vision Language Models
Authors: Zhenhan Huang, Tejaswini Pedapati, Pin-Yu Chen, Jianxi Gao,
Abstract summary: We propose a differentiable prompt learning method dubbed differentiable prompt learning (DPL)<n>DPL is formulated as an optimization problem to automatically determine the optimal context length of the prompt to be added to each layer.<n>We empirically find that by using only limited data, our DPL method can find deep continuous prompt configuration with high confidence.
Score: 49.132774679968456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prompt learning is an effective way to exploit the potential of large-scale pre-trained foundational models. Continuous prompts parameterize context tokens in prompts by turning them into differentiable vectors. Deep continuous prompts insert prompts not only in the input but also in the intermediate hidden representations. Manually designed deep continuous prompts exhibit a remarkable improvement compared to the zero-shot pre-trained model on downstream tasks. How to automate the continuous prompt design is an underexplored area, and a fundamental question arises, is manually designed deep prompt strategy optimal? To answer this question, we propose a method dubbed differentiable prompt learning (DPL). The DPL method is formulated as an optimization problem to automatically determine the optimal context length of the prompt to be added to each layer, where the objective is to maximize the performance. We test the DPL method on the pre-trained CLIP. We empirically find that by using only limited data, our DPL method can find deep continuous prompt configuration with high confidence. The performance on the downstream tasks exhibits the superiority of the automatic design: our method boosts the average test accuracy by 2.60% on 11 datasets compared to baseline methods. Besides, our method focuses only on the prompt configuration (i.e. context length for each layer), which means that our method is compatible with the baseline methods that have sophisticated designs to boost the performance. The DPL method can be deployed to large language models or computer vision models at no cost.

Related papers

FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation [17.51747913191231]
We propose large textbfFaster large textbfDistillation-large textbfBased large textbfPrompt large textbfLL (textbfFDBPL)<n>It addresses issues by sharing soft supervision contexts across multiple training stages and implementing accelerated I/O. Comprehensive evaluations across 11 datasets demonstrate superior performance in base-to-new generalization, cross-dataset transfer, and robustness tests, achieving $2.2times$ faster training speed.
arXiv Detail & Related papers (2025-05-23T15:57:16Z)
IPO: Interpretable Prompt Optimization for Vision-Language Models [40.83071220530289]
This paper introduces a simple but interpretable prompt (IPO) IPO utilizes large language models (LLMs) to generate textual prompts dynamically. We incorporate a large multimodal model (LMM) to condition on visual content by generating image descriptions.
arXiv Detail & Related papers (2024-10-20T14:10:22Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries. We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks. Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.82812214830023]
Efficient Prompting Methods have attracted a wide range of attention.<n>We discuss Automatic Prompt Engineering for different prompt components and Prompt Compression in continuous and discrete spaces.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
LAMM: Label Alignment for Multi-Modal Prompt Learning [17.478967970736115]
We introduce an innovative label alignment method named textbfLAMM, which can adjust the category embeddings of downstream datasets through end-to-end training. Our method significantly improves the performance of existing multi-modal prompt learning models in few-shot scenarios. Our methodology exhibits the preeminence in continual learning compared to other prompt tuning methods.
arXiv Detail & Related papers (2023-12-13T15:29:52Z)
When Prompt-based Incremental Learning Does Not Meet Strong Pretraining [36.0889029038102]
In this work, we develop a learnable Adaptive Prompt Generator (APG) The key is to unify the prompt retrieval and prompt learning processes into a learnable prompt generator. Our method significantly outperforms advanced methods in exemplar-free incremental learning without (strong) pretraining.
arXiv Detail & Related papers (2023-08-21T03:33:21Z)
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting. LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available. LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z)
Instance-wise Prompt Tuning for Pretrained Language Models [72.74916121511662]
Instance-wise Prompt Tuning (IPT) is the first prompt learning paradigm that injects knowledge from the input data instances to the prompts. IPT significantly outperforms task-based prompt learning methods, and achieves comparable performance to conventional finetuning with only 0.5% - 1.5% of tuned parameters.
arXiv Detail & Related papers (2022-06-04T10:08:50Z)
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning [84.75064077323098]
This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL) RLPrompt is flexibly applicable to different types of LMs, such as masked gibberish (e.g., grammaBERT) and left-to-right models (e.g., GPTs) Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods.
arXiv Detail & Related papers (2022-05-25T07:50:31Z)
Learning a Better Initialization for Soft Prompts via Meta-Learning [58.53984967461313]
We propose MetaPT (Meta-learned Prompt Tuning) to improve prompt tuning. We introduce the structure by first clustering pre-training data into different auxiliary tasks. We use these tasks to pre-train prompts with a meta-learning algorithm.
arXiv Detail & Related papers (2022-05-25T03:50:23Z)
Making Pre-trained Language Models End-to-end Few-shot Learners with Contrastive Prompt Tuning [41.15017636192417]
We present CP-Tuning, the first end-to-end Contrastive Prompt Tuning framework for fine-tuning Language Models. It is integrated with the task-invariant continuous prompt encoding technique with fully trainable prompt parameters. Experiments over a variety of language understanding tasks used in IR systems and different PLMs show that CP-Tuning outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-01T02:24:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.