Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label
Prompt Tuning
- URL: http://arxiv.org/abs/2306.01669v2
- Date: Fri, 8 Mar 2024 03:19:39 GMT
- Title: Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label
Prompt Tuning
- Authors: Cristina Menghini, Andrew Delworth, Stephen H. Bach
- Abstract summary: We study the use of pseudolabels, i.e., labels for unlabeled data, to enhance CLIP via prompt tuning.
We observe that learning paradigms such as semi-supervised, transductive zero-shot, and unsupervised learning can all be seen as optimizing the same loss function.
We find that (1) unexplored prompt tuning strategies that iteratively refine pseudolabels consistently improve CLIP accuracy, by 19.5 points in semi-supervised learning, by 28.4 points in transductive zero-shot learning, and by 15.2 points in unsupervised learning.
- Score: 11.284317518288153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is
often necessary to optimize their performance. However, a major obstacle is the
limited availability of labeled data. We study the use of pseudolabels, i.e.,
heuristic labels for unlabeled data, to enhance CLIP via prompt tuning.
Conventional pseudolabeling trains a model on labeled data and then generates
labels for unlabeled data. VLMs' zero-shot capabilities enable a "second
generation" of pseudolabeling approaches that do not require task-specific
training on labeled data. By using zero-shot pseudolabels as a source of
supervision, we observe that learning paradigms such as semi-supervised,
transductive zero-shot, and unsupervised learning can all be seen as optimizing
the same loss function. This unified view enables the development of versatile
training strategies that are applicable across learning paradigms. We
investigate them on image classification tasks where CLIP exhibits limitations,
by varying prompt modalities, e.g., textual or visual prompts, and learning
paradigms. We find that (1) unexplored prompt tuning strategies that
iteratively refine pseudolabels consistently improve CLIP accuracy, by 19.5
points in semi-supervised learning, by 28.4 points in transductive zero-shot
learning, and by 15.2 points in unsupervised learning, and (2) unlike
conventional semi-supervised pseudolabeling, which exacerbates model biases
toward classes with higher-quality pseudolabels, prompt tuning leads to a more
equitable distribution of per-class accuracy. The code to reproduce the
experiments is at https://github.com/BatsResearch/menghini-neurips23-code.
Related papers
- Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data [9.132277138594652]
We propose a Candidate Pseudolabel Learning method to fine-tune vision-language models with abundant unlabeled data.
Our method can result in better performance in true label inclusion and class-balanced instance selection.
arXiv Detail & Related papers (2024-06-15T04:50:20Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly
Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels.
We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z) - Learning with Partial Labels from Semi-supervised Perspective [28.735185883881172]
Partial Label (PL) learning refers to the task of learning from partially labeled data.
We propose a novel PL learning method, namely Partial Label learning with Semi-Supervised Perspective (PLSP)
PLSP significantly outperforms the existing PL baseline methods, especially on high ambiguity levels.
arXiv Detail & Related papers (2022-11-24T15:12:16Z) - Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic
Segmentation [21.163070161951868]
Semi-consuming learning (SSL) can reduce the need for large labelled datasets by incorporating unsupervised data into the training.
Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels.
We use three mechanisms to control pseudo-label noise and errors.
arXiv Detail & Related papers (2022-10-19T09:46:27Z) - Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch.
A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels.
ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - A Closer Look at Self-training for Zero-Label Semantic Segmentation [53.4488444382874]
Being able to segment unseen classes not observed during training is an important technical challenge in deep learning.
Prior zero-label semantic segmentation works approach this task by learning visual-semantic embeddings or generative models.
We propose a consistency regularizer to filter out noisy pseudo-labels by taking the intersections of the pseudo-labels generated from different augmentations of the same image.
arXiv Detail & Related papers (2021-04-21T14:34:33Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z) - Iterative label cleaning for transductive and semi-supervised few-shot
learning [16.627512688664513]
Few-shot learning amounts to learning representations and acquiring knowledge such that novel tasks may be solved with both supervision and data being limited.
We introduce a new algorithm that leverages the manifold structure of the labeled and unlabeled data distribution to predict pseudo-labels.
Our solution surpasses or matches the state of the art results on four benchmark datasets.
arXiv Detail & Related papers (2020-12-14T21:54:11Z) - Boosting the Performance of Semi-Supervised Learning with Unsupervised
Clustering [10.033658645311188]
We show that ignoring labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime.
We demonstrate our method's efficacy in boosting several state-of-the-art SSL algorithms.
arXiv Detail & Related papers (2020-12-01T14:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.