Related papers: ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning

ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning

URL: http://arxiv.org/abs/2211.12417v4
Date: Fri, 15 Dec 2023 11:50:32 GMT
Title: ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning
Authors: Fushuo Huo, Wenchao Xu, Song Guo, Jingcai Guo, Haozhao Wang, Ziming Liu, Xiaocheng Lu
Abstract summary: Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space. We propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks.
Score: 29.591615811894265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space, which induces a tremendously large output space containing all possible state-object compositions. Existing works either learn the joint compositional state-object embedding or predict simple primitives with separate classifiers. However, the former heavily relies on external word embedding methods, and the latter ignores the interactions of interdependent primitives, respectively. In this paper, we revisit the primitive prediction approach and propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks. Specifically, the cross-primitive compatibility module explicitly learns to model the interactions of state and object features with the trainable memory units, which efficiently acquires cross-primitive visual attention to reason high-feasibility compositions, without the aid of external knowledge. Moreover, considering the partial-supervision setting (pCZSL) as well as the imbalance issue of multiple task prediction, we design a progressive training paradigm to enable the primitive classifiers to interact to obtain discriminative information in an easy-to-hard manner. Extensive experiments on three widely used benchmark datasets demonstrate that our method outperforms other representative methods on both OW-CZSL and pCZSL settings by large margins.

Related papers

EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning [31.95599022275838]
We propose EVA, a Mixture-of-Experts Semantic Variant Alignment framework for Compositional Zero-Shot Learning (CZSL)<n>Specifically, we introduce domain-expert adaption, leveraging multiple experts to achieve token-aware learning and model high-quality primitive representations.<n>Our method significantly outperforms other state-of-the-art CZSL methods on three popular benchmarks in both closed- and open-world settings.
arXiv Detail & Related papers (2025-06-26T04:00:55Z)
Learning Clustering-based Prototypes for Compositional Zero-shot Learning [56.57299428499455]
ClusPro is a robust clustering-based prototype mining framework for Compositional Zero-Shot Learning. It defines the conceptual boundaries of primitives through a set of diversified prototypes. ClusPro efficiently performs prototype clustering in a non-parametric fashion without the introduction of additional learnable parameters.
arXiv Detail & Related papers (2025-02-10T14:20:01Z)
Unified Framework for Open-World Compositional Zero-shot Learning [39.521304311470146]
Open-World Compositional Zero-Shot Learning (OW-CZSL) addresses the challenge of recognizing novel compositions of known primitives and entities. We introduce a novel module aimed at alleviating the computational burden associated with exhaustive exploration of all possible compositions during the inference stage. Our proposed model achieves state-of-the-art in OW-CZSL in three datasets, while surpassing Large Vision Language Models (LLVM) in two datasets.
arXiv Detail & Related papers (2024-12-05T11:36:37Z)
Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [49.919635694894204]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL) We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
arXiv Detail & Related papers (2024-08-19T08:23:09Z)
Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning [12.558701595138928]
Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs. We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects. Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions.
arXiv Detail & Related papers (2024-07-18T17:11:29Z)
Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning [23.757252768668497]
Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object. We propose a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem.
arXiv Detail & Related papers (2024-06-21T08:18:30Z)
Supervised Stochastic Neighbor Embedding Using Contrastive Learning [4.560284382063488]
Clusters of samples belonging to the same class are pulled together in low-dimensional embedding space. We extend the self-supervised contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information.
arXiv Detail & Related papers (2023-09-15T00:26:21Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning [86.5258816031722]
The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. In Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions.
arXiv Detail & Related papers (2022-11-05T12:57:06Z)
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images. Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.