Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
Refinement
- URL: http://arxiv.org/abs/2304.01195v1
- Date: Mon, 3 Apr 2023 17:58:54 GMT
- Title: Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
Refinement
- Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin
Zhao, Peng Gao
- Abstract summary: We propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency.
For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less learnable parameters.
- Score: 24.108008515395458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The popularity of Contrastive Language-Image Pre-training (CLIP) has
propelled its application to diverse downstream vision tasks. To improve its
capacity on downstream tasks, few-shot learning has become a widely-adopted
technique. However, existing methods either exhibit limited performance or
suffer from excessive learnable parameters. In this paper, we propose APE, an
Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which
achieves superior accuracy with high computational efficiency. Via a prior
refinement module, we analyze the inter-class disparity in the downstream data
and decouple the domain-specific knowledge from the CLIP-extracted cache model.
On top of that, we introduce two model variants, a training-free APE and a
training-required APE-T. We explore the trilateral affinities between the test
image, prior cache model, and textual representations, and only enable a
lightweight category-residual module to be trained. For the average accuracy
over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively
outperform the second-best by +1.59% and +1.99% under 16 shots with x30 less
learnable parameters.
Related papers
- A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - Class Incremental Learning with Pre-trained Vision-Language Models [59.15538370859431]
We propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that enables further adaptation.
Experiments on several conventional benchmarks consistently show a significant margin of improvement over the current state-of-the-art.
arXiv Detail & Related papers (2023-10-31T10:45:03Z) - RanPAC: Random Projections and Pre-trained Models for Continual Learning [59.07316955610658]
Continual learning (CL) aims to learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones.
We propose a concise and effective approach for CL with pre-trained models.
arXiv Detail & Related papers (2023-07-05T12:49:02Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Selective classification using a robust meta-learning approach [28.460912135533988]
We propose a novel instance-conditioned reweighting approach that captures predictive uncertainty using an auxiliary network.
We show in controlled experiments that we effectively capture the diverse specific notions of uncertainty through this meta-objective.
For diabetic retinopathy, we see upto 3.4%/3.3% accuracy and AUC gains over SOTA in selective classification.
arXiv Detail & Related papers (2022-12-12T15:45:23Z) - CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention [31.84299688413136]
Contrastive Language-Image Pre-training has been shown to learn visual representations with great transferability.
Existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets.
We introduce a free-lunch enhancement method, CALIP, to boost CLIP's zero-shot performance via a parameter-free Attention module.
arXiv Detail & Related papers (2022-09-28T15:22:11Z) - Efficient Fine-Tuning of Compressed Language Models with Learners [12.768368718187428]
We introduce Learner modules and priming, novel methods for fine-tuning BERT-based models.
Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores.
Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines.
arXiv Detail & Related papers (2022-08-03T13:42:30Z) - Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [58.06983806317233]
Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations using large-scale image-text pairs.
To enhance CLIP's adaption capability, existing methods proposed to fine-tune additional learnable modules.
We propose a training-free adaption method for CLIP to conduct few-shot classification, termed as Tip-Adapter.
arXiv Detail & Related papers (2022-07-19T19:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.