Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt
- URL: http://arxiv.org/abs/2205.11100v2
- Date: Sat, 23 Mar 2024 09:14:19 GMT
- Title: Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt
- Authors: Jiangmeng Li, Wenyi Mo, Wenwen Qiang, Bing Su, Changwen Zheng, Hui Xiong, Ji-Rong Wen,
- Abstract summary: Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts.
To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts.
However, how and what prompts can improve inference performance remains unclear.
- Score: 71.77504700496004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts. To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts, i.e., classification weights are synthesized from natural language describing task-relevant categories, to reduce the gap between tasks in the training and test phases. However, how and what prompts can improve inference performance remains unclear. In this paper, we explicitly clarify the importance of including semantic information in prompts, while existing prompting methods generate prompts without exploring the semantic information of textual labels. Manually constructing prompts with rich semantics requires domain expertise and is extremely time-consuming. To cope with this issue, we propose a semantic-aware prompt learning method, namely CPKP, which retrieves an ontological knowledge graph by treating the textual label as a query to extract task-relevant semantic information. CPKP further introduces a double-tier confounder-pruning procedure to refine the derived semantic information. The graph-tier confounders are gradually identified and phased out, inspired by the principle of Granger causality. The feature-tier confounders are demolished by following the maximum entropy principle in information theory. Empirically, the evaluations demonstrate the effectiveness of CPKP, e.g., with two shots, CPKP outperforms the manual-prompt method by 4.64% and the learnable-prompt method by 1.09% on average, and the superiority of CPKP in domain generalization compared to benchmark approaches. Our implementation is available at https://github.com/Mowenyii/CPKP.
Related papers
- On the loss of context-awareness in general instruction fine-tuning [101.03941308894191]
Post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs can harm existing capabilities learned during pretraining.
We propose two methods to mitigate the loss of context awareness in instruct models: post-hoc attention steering on user prompts and conditional instruction fine-tuning with a context-dependency indicator.
arXiv Detail & Related papers (2024-11-05T00:16:01Z) - Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model [27.56988000960972]
We introduce a new framework based on a dual context of both domain-shared and class-specific contexts.
Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in Large Language Models.
We also formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens.
arXiv Detail & Related papers (2024-07-05T13:15:29Z) - Instructing Prompt-to-Prompt Generation for Zero-Shot Learning [116.33775552866476]
We propose a textbfPrompt-to-textbfPrompt generation methodology (textbfP2P) to distill instructive visual prompts for transferable knowledge discovery.
The core of P2P is to mine semantic-related instruction from prompt-conditioned visual features and text instruction on modal-sharing semantic concepts.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - Conditional Prototype Rectification Prompt Learning [32.533844163120875]
We propose a Prototype Rectification Prompt Learning (CPR) method to correct the bias of base examples and augment limited data in an effective way.
CPR achieves state-of-the-art performance on both few-shot classification and base-to-new generalization tasks.
arXiv Detail & Related papers (2024-04-15T15:43:52Z) - DPL: Decoupled Prompt Learning for Vision-Language Models [41.90997623029582]
We propose a new method, Decoupled Prompt Learning, which reformulates the attention in prompt learning to alleviate this problem.
Our approach is flexible for both visual and textual modalities, making it easily extendable to multi-modal prompt learning.
arXiv Detail & Related papers (2023-08-19T15:48:38Z) - LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting.
LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available.
LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z) - CUP: Curriculum Learning based Prompt Tuning for Implicit Event Argument
Extraction [22.746071199667146]
Implicit event argument extraction (EAE) aims to identify arguments that could scatter over the document.
We propose a Curriculum learning based Prompt tuning (CUP) approach, which resolves implicit EAE by four learning stages.
In addition, we integrate a prompt-based encoder-decoder model to elicit related knowledge from pre-trained language models.
arXiv Detail & Related papers (2022-05-01T16:03:54Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization
for Relation Extraction [111.74812895391672]
We propose a Knowledge-aware Prompt-tuning approach with synergistic optimization (KnowPrompt)
We inject latent knowledge contained in relation labels into prompt construction with learnable virtual type words and answer words.
arXiv Detail & Related papers (2021-04-15T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.