Learning-to-Rank Meets Language: Boosting Language-Driven Ordering
Alignment for Ordinal Classification
- URL: http://arxiv.org/abs/2306.13856v3
- Date: Mon, 23 Oct 2023 12:20:56 GMT
- Title: Learning-to-Rank Meets Language: Boosting Language-Driven Ordering
Alignment for Ordinal Classification
- Authors: Rui Wang, Peipei Li, Huaibo Huang, Chunshui Cao, Ran He, Zhaofeng He
- Abstract summary: We present a novel language-driven ordering alignment method for ordinal classification.
Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language.
Experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance.
- Score: 60.28913031192201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel language-driven ordering alignment method for ordinal
classification. The labels in ordinal classification contain additional
ordering relations, making them prone to overfitting when relying solely on
training data. Recent developments in pre-trained vision-language models
inspire us to leverage the rich ordinal priors in human language by converting
the original task into a visionlanguage alignment task. Consequently, we
propose L2RCLIP, which fully utilizes the language priors from two
perspectives. First, we introduce a complementary prompt tuning technique
called RankFormer, designed to enhance the ordering relation of original rank
prompts. It employs token-level attention with residual-style prompt blending
in the word embedding space. Second, to further incorporate language priors, we
revisit the approximate bound optimization of vanilla cross-entropy loss and
restructure it within the cross-modal embedding space. Consequently, we propose
a cross-modal ordinal pairwise loss to refine the CLIP feature space, where
texts and images maintain both semantic alignment and ordering alignment.
Extensive experiments on three ordinal classification tasks, including facial
age estimation, historical color image (HCI) classification, and aesthetic
assessment demonstrate its promising performance. The code is available at
https://github.com/raywang335/L2RCLIP.
Related papers
- Teach CLIP to Develop a Number Sense for Ordinal Regression [10.046473198947432]
We first investigate CLIP's potential for ordinal regression, from which we expect the model could generalise to different ordinal regression tasks and scenarios.
Unfortunately, vanilla CLIP fails on this task, since current VLMs have a well-documented limitation of encapsulating compositional concepts such as number sense.
We propose a simple yet effective method called NumCLIP to improve the quantitative understanding of VLMs.
arXiv Detail & Related papers (2024-08-07T06:26:04Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Prompting Language-Informed Distribution for Compositional Zero-Shot Learning [73.49852821602057]
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts.
We propose a model by prompting the language-informed distribution, aka., PLID, for the task.
Experimental results on MIT-States, UT-Zappos, and C-GQA datasets show the superior performance of the PLID to the prior arts.
arXiv Detail & Related papers (2023-05-23T18:00:22Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal
Regression [94.28253749970534]
We propose to learn the rank concepts from the rich semantic CLIP latent space.
OrdinalCLIP consists of learnable context tokens and learnable rank embeddings.
Experimental results show that our paradigm achieves competitive performance in general ordinal regression tasks.
arXiv Detail & Related papers (2022-06-06T03:54:53Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.