OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal
Regression
- URL: http://arxiv.org/abs/2206.02338v1
- Date: Mon, 6 Jun 2022 03:54:53 GMT
- Title: OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal
Regression
- Authors: Wanhua Li, Xiaoke Huang, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu,
Jie Zhou
- Abstract summary: We propose to learn the rank concepts from the rich semantic CLIP latent space.
OrdinalCLIP consists of learnable context tokens and learnable rank embeddings.
Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks.
- Score: 94.28253749970534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a language-powered paradigm for ordinal regression.
Existing methods usually treat each rank as a category and employ a set of
weights to learn these concepts. These methods are easy to overfit and usually
attain unsatisfactory performance as the learned concepts are mainly derived
from the training set. Recent large pre-trained vision-language models like
CLIP have shown impressive performance on various visual tasks. In this paper,
we propose to learn the rank concepts from the rich semantic CLIP latent space.
Specifically, we reformulate this task as an image-language matching problem
with a contrastive objective, which regards labels as text and obtains a
language prototype from a text encoder for each rank. While prompt engineering
for CLIP is extremely time-consuming, we propose OrdinalCLIP, a differentiable
prompting method for adapting CLIP for ordinal regression. OrdinalCLIP consists
of learnable context tokens and learnable rank embeddings; The learnable rank
embeddings are constructed by explicitly modeling numerical continuity,
resulting in well-ordered, compact language prototypes in the CLIP space. Once
learned, we can only save the language prototypes and discard the huge language
model, resulting in zero additional computational overhead compared with the
linear head counterpart. Experimental results show that our paradigm achieves
competitive performance in general ordinal regression tasks, and gains
improvements in few-shot and distribution shift settings for age estimation.
Related papers
- Teach CLIP to Develop a Number Sense for Ordinal Regression [10.046473198947432]
We first investigate CLIP's potential for ordinal regression, from which we expect the model could generalise to different ordinal regression tasks and scenarios.
Unfortunately, vanilla CLIP fails on this task, since current VLMs have a well-documented limitation of encapsulating compositional concepts such as number sense.
We propose a simple yet effective method called NumCLIP to improve the quantitative understanding of VLMs.
arXiv Detail & Related papers (2024-08-07T06:26:04Z) - A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation [121.0693322732454]
Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity.
Recent research has focused on developing efficient fine-tuning methods to enhance CLIP's performance in downstream tasks.
We revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.
arXiv Detail & Related papers (2024-02-06T15:45:27Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Learning-to-Rank Meets Language: Boosting Language-Driven Ordering
Alignment for Ordinal Classification [60.28913031192201]
We present a novel language-driven ordering alignment method for ordinal classification.
Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language.
Experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance.
arXiv Detail & Related papers (2023-06-24T04:11:31Z) - AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning [53.32576252950481]
Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data.
In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks.
arXiv Detail & Related papers (2023-05-19T07:39:17Z) - Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category.
We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP.
Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.