CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
- URL: http://arxiv.org/abs/2302.02551v3
- Date: Wed, 31 May 2023 07:44:28 GMT
- Title: CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
- Authors: Zachary Novack, Julian McAuley, Zachary C. Lipton, Saurabh Garg
- Abstract summary: Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification.
We propose Classification with Hierarchical Label Sets (or CHiLS) for datasets with implicit semantic hierarchies.
CHiLS is simple to implement within existing zero-shot pipelines and requires no additional training cost.
- Score: 24.868024094095983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot
classification through their ability generate embeddings for each class based
on their (natural language) names. Prior work has focused on improving the
accuracy of these models through prompt engineering or by incorporating a small
amount of labeled downstream data (via finetuning). However, there has been
little focus on improving the richness of the class names themselves, which can
pose issues when class labels are coarsely-defined and are uninformative. We
propose Classification with Hierarchical Label Sets (or CHiLS), an alternative
strategy for zero-shot classification specifically designed for datasets with
implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each
class, produce a set of subclasses, using either existing label hierarchies or
by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though
these subclasses were the labels of interest; (iii) map the predicted subclass
back to its parent to produce the final prediction. Across numerous datasets
with underlying hierarchical structure, CHiLS leads to improved accuracy in
situations both with and without ground-truth hierarchical information. CHiLS
is simple to implement within existing zero-shot pipelines and requires no
additional training cost. Code is available at:
https://github.com/acmi-lab/CHILS.
Related papers
- Lidar Panoptic Segmentation in an Open World [50.094491113541046]
Lidar Panoptics (LPS) is crucial for safe deployment of autonomous vehicles.
LPS aims to recognize and segment lidar points wr.t. a pre-defined vocabulary of semantic classes.
We propose a class-agnostic point clustering and over-segment the input cloud in a hierarchical fashion, followed by binary point segment classification.
arXiv Detail & Related papers (2024-09-22T00:10:20Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision [41.05874642535256]
Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy.
Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data.
We work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision.
arXiv Detail & Related papers (2024-02-29T22:26:07Z) - TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary
Multi-Label Classification of CLIP Without Training [29.431698321195814]
Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification.
CLIP shows poor performance on multi-label datasets because the global feature tends to be dominated by the most prominent class.
We propose a local-to-global framework to obtain image tags.
arXiv Detail & Related papers (2023-12-20T08:15:40Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - ProTeCt: Prompt Tuning for Taxonomic Open Set Classification [59.59442518849203]
Few-shot adaptation methods do not fare well in the taxonomic open set (TOS) setting.
We propose a prompt tuning technique that calibrates the hierarchical consistency of model predictions.
A new Prompt Tuning for Hierarchical Consistency (ProTeCt) technique is then proposed to calibrate classification across label set granularities.
arXiv Detail & Related papers (2023-06-04T02:55:25Z) - Instance-level Few-shot Learning with Class Hierarchy Mining [26.273796311012042]
We exploit hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects.
These features are extracted from abundant data of base classes, which could be utilized to reasonably describe classes with scarce data.
In order to effectively train the hierarchy-based-detector in FSIS, we apply the label refinement to further describe the associations between fine-grained classes.
arXiv Detail & Related papers (2023-04-15T02:55:08Z) - Inducing a hierarchy for multi-class classification problems [11.58041597483471]
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not.
In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers.
We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
arXiv Detail & Related papers (2021-02-20T05:40:42Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z) - Attribute Propagation Network for Graph Zero-shot Learning [57.68486382473194]
We introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier.
APNet achieves either compelling performance or new state-of-the-art results in experiments with two zero-shot learning settings and five benchmark datasets.
arXiv Detail & Related papers (2020-09-24T16:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.