Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt
Learning with Data-Dependent Prior
- URL: http://arxiv.org/abs/2401.06799v1
- Date: Tue, 9 Jan 2024 10:15:59 GMT
- Title: Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt
Learning with Data-Dependent Prior
- Authors: Youngjae Cho, HeeSun Bae, Seungjae Shin, Yeo Dong Youn, Weonyoung Joo,
Il-Chul Moon
- Abstract summary: Recent Vision-Language Pretrained models have become the backbone for many downstream tasks.
MLE training can lead the context vector to over-fit dominant image features in the training data.
This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application.
- Score: 14.232144691524528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent Vision-Language Pretrained (VLP) models have become the backbone for
many downstream tasks, but they are utilized as frozen model without learning.
Prompt learning is a method to improve the pre-trained VLP model by adding a
learnable context vector to the inputs of the text encoder. In a few-shot
learning scenario of the downstream task, MLE training can lead the context
vector to over-fit dominant image features in the training data. This
overfitting can potentially harm the generalization ability, especially in the
presence of a distribution shift between the training and test dataset. This
paper presents a Bayesian-based framework of prompt learning, which could
alleviate the overfitting issues on few-shot learning application and increase
the adaptability of prompts on unseen instances. Specifically, modeling
data-dependent prior enhances the adaptability of text features for both seen
and unseen image features without the trade-off of performance between them.
Based on the Bayesian framework, we utilize the Wasserstein Gradient Flow in
the estimation of our target posterior distribution, which enables our prompt
to be flexible in capturing the complex modes of image features. We demonstrate
the effectiveness of our method on benchmark datasets for several experiments
by showing statistically significant improvements on performance compared to
existing methods. The code is available at https://github.com/youngjae-cho/APP.
Related papers
- Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Premonition: Using Generative Models to Preempt Future Data Changes in
Continual Learning [63.850451635362425]
Continual learning requires a model to adapt to ongoing changes in the data distribution.
We show that the combination of a large language model and an image generation model can similarly provide useful premonitions.
We find that the backbone of our pre-trained networks can learn representations useful for the downstream continual learning problem.
arXiv Detail & Related papers (2024-03-12T06:29:54Z) - In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model [13.983810804606264]
We propose In-Context Prompt Learning (InCPL) for test-time visual recognition tasks.
InCPL associates a new test sample with very few labeled examples as context information.
We introduce a context-aware unsupervised loss to optimize visual prompts tailored to test samples.
arXiv Detail & Related papers (2024-03-10T08:15:51Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks.
We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z) - Generative Negative Text Replay for Continual Vision-Language
Pretraining [95.2784858069843]
Vision-language pre-training has attracted increasing attention recently.
Massive data are usually collected in a streaming fashion.
We propose a multi-modal knowledge distillation between images and texts to align the instance-wise prediction between old and new models.
arXiv Detail & Related papers (2022-10-31T13:42:21Z) - DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [91.56988987393483]
We present a new framework for dense prediction by implicitly and explicitly leveraging the pre-trained knowledge from CLIP.
Specifically, we convert the original image-text matching problem in CLIP to a pixel-text matching problem and use the pixel-text score maps to guide the learning of dense prediction models.
Our method is model-agnostic, which can be applied to arbitrary dense prediction systems and various pre-trained visual backbones.
arXiv Detail & Related papers (2021-12-02T18:59:32Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.