On the Role of Attention in Prompt-tuning
- URL: http://arxiv.org/abs/2306.03435v1
- Date: Tue, 6 Jun 2023 06:23:38 GMT
- Title: On the Role of Attention in Prompt-tuning
- Authors: Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos
Thrampoulidis
- Abstract summary: We study prompt-tuning for one-layer attention architectures and study contextual mixture-models.
We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention.
We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.
- Score: 90.97555030446563
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-)prompt parameter from data. Despite its
success in LLMs, there is limited theoretical understanding of the power of
prompt-tuning and the role of the attention mechanism in prompting. In this
work, we explore prompt-tuning for one-layer attention architectures and study
contextual mixture-models where each input token belongs to a context-relevant
or -irrelevant set. We isolate the role of prompt-tuning through a
self-contained prompt-attention model. Our contributions are as follows: (1) We
show that softmax-prompt-attention is provably more expressive than
softmax-self-attention and linear-prompt-attention under our contextual data
model. (2) We analyze the initial trajectory of gradient descent and show that
it learns the prompt and prediction head with near-optimal sample complexity
and demonstrate how prompt can provably attend to sparse context-relevant
tokens. (3) Assuming a known prompt but an unknown prediction head, we
characterize the exact finite sample performance of prompt-attention which
reveals the fundamental performance limits and the precise benefit of the
context information. We also provide experiments that verify our theoretical
insights on real datasets and demonstrate how prompt-tuning enables the model
to attend to context-relevant information.
Related papers
- Revisiting Prompt Pretraining of Vision-Language Models [13.888505919946578]
We propose a general framework termed Revisiting Prompt Pretraining (RPP)
RPP targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.
We additionally utilize soft labels derived from zero-shot probability predictions provided by a pretrained Contrastive Language Image Pretraining (CLIP) teacher model.
arXiv Detail & Related papers (2024-09-10T02:36:13Z) - IntCoOp: Interpretability-Aware Vision-Language Prompt Tuning [94.52149969720712]
IntCoOp learns to jointly align attribute-level inductive biases and class embeddings during prompt-tuning.
IntCoOp improves CoOp by 7.35% in average performance across 10 diverse datasets.
arXiv Detail & Related papers (2024-06-19T16:37:31Z) - Visual Attention Prompted Prediction and Learning [18.29952735827087]
This paper introduces a novel framework for attention-prompted prediction and learning, utilizing visual prompts to steer the model's reasoning process.
Experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.
arXiv Detail & Related papers (2023-10-12T15:39:54Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
Language Understanding [51.48361798508375]
We develop an information-theoretic framework that formulates soft prompt tuning as maximizing mutual information between prompts and other model parameters.
We show that InfoPrompt can significantly accelerate the convergence of the prompt tuning and outperform traditional prompt tuning methods.
arXiv Detail & Related papers (2023-06-08T04:31:48Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z) - Prompt-Learning for Fine-Grained Entity Typing [40.983849729537795]
We investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios.
We propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types.
arXiv Detail & Related papers (2021-08-24T09:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.