Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training
- URL: http://arxiv.org/abs/2412.07161v1
- Date: Tue, 10 Dec 2024 03:41:20 GMT
- Title: Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training
- Authors: Yun Li, Zhe Liu, Lina Yao,
- Abstract summary: This paper introduces a novel framework, Understanding and Linking Attributes and Objects (ULAO) in Compositional Zero-Shot Learning (CZSL)
ULAO comprises two innovative modules. The Understanding Attributes and Objects (UAO) module improves primitive understanding by sequential primitive prediction and leveraging recognized objects as contextual hints for attribute classification.
The Linking Attributes and Objects (LAO) module improves the attribute-object linkage understanding through a new contrastive learning strategy that incorporates tailored hard negative generation and adaptive loss adjustments.
- Score: 17.893694262999826
- License:
- Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of seen attributes and objects. Current CLIP-based methods in CZSL, despite their advancements, often fail to effectively understand and link the attributes and objects due to inherent limitations in CLIP's pretraining mechanisms. To address these shortcomings, this paper introduces a novel framework, Understanding and Linking Attributes and Objects (ULAO) in CZSL, which comprises two innovative modules. The Understanding Attributes and Objects (UAO) module improves primitive understanding by sequential primitive prediction and leveraging recognized objects as contextual hints for attribute classification. Concurrently, the Linking Attributes and Objects (LAO) module improves the attribute-object linkage understanding through a new contrastive learning strategy that incorporates tailored hard negative generation and adaptive loss adjustments. We demonstrate our model's superiority by showcasing its state-of-the-art performance across three benchmark datasets in both Closed-World (CW) and Open-World (OW) scenarios.
Related papers
- Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition [1.2499537119440243]
We tackle zero shot "real" classification by description, a novel task that evaluates the ability of Vision-Language Models (VLMs) to classify objects based solely on descriptive attributes, excluding object class names.
We release description data for six popular fine-grained benchmarks, which omit object names to encourage genuine zero-shot learning.
We introduce a modified CLIP architecture that leverages multiple resolutions to improve the detection of fine-grained part attributes.
arXiv Detail & Related papers (2024-12-18T15:28:08Z) - Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning [21.488599805772054]
Compositional zero-shot learning aims to recognize novel compositions of attributes and objects learned from seen compositions.
Previous works disentangle attribute and object by extracting shared and exclusive parts between image pairs sharing the same attribute (object)
We propose a novel framework named Multimodal Large Language Model (MLLM) embeddings and attribute smoothing guided disentanglement (TRIDENT) for CZSL.
arXiv Detail & Related papers (2024-11-18T07:55:54Z) - Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [49.919635694894204]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)
We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
arXiv Detail & Related papers (2024-08-19T08:23:09Z) - CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot
Learning [62.090051975043544]
Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL)
We propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet)
arXiv Detail & Related papers (2024-03-09T14:18:41Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Learning Conditional Attributes for Compositional Zero-Shot Learning [78.24309446833398]
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts.
One of the challenges is to model attributes interacted with different objects, e.g., the attribute wet" in wet apple" and wet cat" is different.
We argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings.
arXiv Detail & Related papers (2023-05-29T08:04:05Z) - Learning Invariant Visual Representations for Compositional Zero-Shot
Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set.
We propose an invariant feature learning framework to align different domains at the representation and gradient levels.
Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z) - Attribute-Modulated Generative Meta Learning for Zero-Shot
Classification [52.64680991682722]
We present the Attribute-Modulated generAtive meta-model for Zero-shot learning (AMAZ)
Our model consists of an attribute-aware modulation network and an attribute-augmented generative network.
Our empirical evaluations show that AMAZ improves state-of-the-art methods by 3.8% and 5.1% in ZSL and generalized ZSL settings, respectively.
arXiv Detail & Related papers (2021-04-22T04:16:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.