Related papers: CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning

CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning

URL: http://arxiv.org/abs/2403.05924v2
Date: Wed, 13 Mar 2024 11:36:43 GMT
Title: CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning
Authors: Yanyi Zhang, Qi Jia, Xin Fan, Yu Liu, Ran He
Abstract summary: Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL) We propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet)
Score: 62.090051975043544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL), whose aim is to recognize novel A-O compositions based on foregone knowledge. Existing methods based on disentangled representation learning lose sight of the contextual dependency between the A-O primitive pairs. Inspired by this, we propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet). The key insight is to firstly classify one primitive and then specifies the predicted class as a priori for guiding another primitive recognition in a cascaded fashion. To this end, CSCNet constructs Attribute-to-Object and Object-to-Attribute cascaded branches, in addition to a composition branch modeling the two primitives as a whole. Notably, we devise a parametric classifier (ParamCls) to improve the matching between visual and semantic embeddings. By improving the A-O disentanglement, our framework achieves superior results than previous competitive methods.

Related papers

A Conditional Probability Framework for Compositional Zero-shot Learning [86.86063926727489]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of known objects and attributes by leveraging knowledge from previously seen compositions.<n>Traditional approaches primarily focus on disentangling attributes and objects, treating them as independent entities during learning.<n>We adopt a Conditional Probability Framework (CPF) to explicitly model attribute-object dependencies.
arXiv Detail & Related papers (2025-07-23T10:20:52Z)
Attention-disentangled Uniform Orthogonal Feature Space Optimization for Few-shot Object Detection [20.748630029722257]
Few-shot object detection (FSOD) aims to detect objects with limited samples for novel classes.<n>Existing FSOD approaches, predominantly built on the Faster R-CNN detector, entangle objectness recognition and foreground classification within shared feature spaces.<n>We propose a Uniform Orthogonal Feature Space (UOFS) optimization framework to transfer class-agnostic objectness knowledge from base classes to novel classes.
arXiv Detail & Related papers (2025-06-27T12:17:04Z)
Learning Clustering-based Prototypes for Compositional Zero-shot Learning [56.57299428499455]
ClusPro is a robust clustering-based prototype mining framework for Compositional Zero-Shot Learning. It defines the conceptual boundaries of primitives through a set of diversified prototypes. ClusPro efficiently performs prototype clustering in a non-parametric fashion without the introduction of additional learnable parameters.
arXiv Detail & Related papers (2025-02-10T14:20:01Z)
Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training [17.893694262999826]
This paper introduces a novel framework, Understanding and Linking Attributes and Objects (ULAO) in Compositional Zero-Shot Learning (CZSL) ULAO comprises two innovative modules. The Understanding Attributes and Objects (UAO) module improves primitive understanding by sequential primitive prediction and leveraging recognized objects as contextual hints for attribute classification. The Linking Attributes and Objects (LAO) module improves the attribute-object linkage understanding through a new contrastive learning strategy that incorporates tailored hard negative generation and adaptive loss adjustments.
arXiv Detail & Related papers (2024-12-10T03:41:20Z)
Knowledge Adaptation Network for Few-Shot Class-Incremental Learning [23.90555521006653]
Few-shot class-incremental learning aims to incrementally recognize new classes using a few samples. One of the effective methods to solve this challenge is to construct prototypical evolution classifiers. Because representations for new classes are weak and biased, we argue such a strategy is suboptimal.
arXiv Detail & Related papers (2024-09-18T07:51:38Z)
Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [49.919635694894204]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL) We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
arXiv Detail & Related papers (2024-08-19T08:23:09Z)
HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts [25.930021907054797]
We propose a novel framework that combines the Modern Hopfield Network with a Mixture of Experts to classify the compositions of previously unseen objects. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos.
arXiv Detail & Related papers (2023-11-23T07:32:20Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
Learning Conditional Attributes for Compositional Zero-Shot Learning [78.24309446833398]
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts. One of the challenges is to model attributes interacted with different objects, e.g., the attribute wet" in wet apple" and wet cat" is different. We argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings.
arXiv Detail & Related papers (2023-05-29T08:04:05Z)
CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning [14.496173899477283]
We study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts. We propose to insert adapters, a parameter-efficient technique proven to be effective among large language models, into each CLIP encoder layer. We further equip adapters with concept awareness so that concept-specific features of "object", "attribute", and "composition" can be extracted.
arXiv Detail & Related papers (2023-05-26T07:02:57Z)
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images. Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z)
Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL. In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network. In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.