Related papers: Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

URL: http://arxiv.org/abs/2408.09786v1
Date: Mon, 19 Aug 2024 08:23:09 GMT
Title: Cross-composition Feature Disentanglement for Compositional Zero-shot Learning
Authors: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Zhuo Chen, Xiang Chen, Can Xu, Yuxiang Wang, Xiaoliang Xu,
Abstract summary: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL) We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
Score: 49.919635694894204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end, we propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions. More specifically, we leverage a compositional graph to define the overall primitive-sharing relationships between compositions, and build a task-specific architecture upon the recently successful large pre-trained vision-language model (VLM) CLIP, with dual cross-composition disentangling adapters (called L-Adapter and V-Adapter) inserted into CLIP's frozen text and image encoders, respectively. Evaluation on three popular CZSL benchmarks shows that our proposed solution significantly improves the performance of CZSL, and its components have been verified by solid ablation studies.

Related papers

A Conditional Probability Framework for Compositional Zero-shot Learning [86.86063926727489]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of known objects and attributes by leveraging knowledge from previously seen compositions.<n>Traditional approaches primarily focus on disentangling attributes and objects, treating them as independent entities during learning.<n>We adopt a Conditional Probability Framework (CPF) to explicitly model attribute-object dependencies.
arXiv Detail & Related papers (2025-07-23T10:20:52Z)
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning [31.95599022275838]
We propose EVA, a Mixture-of-Experts Semantic Variant Alignment framework for Compositional Zero-Shot Learning (CZSL)<n>Specifically, we introduce domain-expert adaption, leveraging multiple experts to achieve token-aware learning and model high-quality primitive representations.<n>Our method significantly outperforms other state-of-the-art CZSL methods on three popular benchmarks in both closed- and open-world settings.
arXiv Detail & Related papers (2025-06-26T04:00:55Z)
Learning Clustering-based Prototypes for Compositional Zero-shot Learning [56.57299428499455]
ClusPro is a robust clustering-based prototype mining framework for Compositional Zero-Shot Learning. It defines the conceptual boundaries of primitives through a set of diversified prototypes. ClusPro efficiently performs prototype clustering in a non-parametric fashion without the introduction of additional learnable parameters.
arXiv Detail & Related papers (2025-02-10T14:20:01Z)
Learning Visual Proxy for Compositional Zero-Shot Learning [15.183106475115583]
We introduce Visual Proxy Learning, a novel approach that facilitates the learning of distinct visual distributions. We propose an effective Cross-Modal Joint Learning strategy that imposes cross-modal constraints between the original text-image space and the fine-grained visual space.
arXiv Detail & Related papers (2025-01-23T17:30:27Z)
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning [21.488599805772054]
Compositional zero-shot learning aims to recognize novel compositions of attributes and objects learned from seen compositions. Previous works disentangle attribute and object by extracting shared and exclusive parts between image pairs sharing the same attribute (object) We propose a novel framework named Multimodal Large Language Model (MLLM) embeddings and attribute smoothing guided disentanglement (TRIDENT) for CZSL.
arXiv Detail & Related papers (2024-11-18T07:55:54Z)
CSCNET: Class-Specified Cascaded Network for Compositional Zero-Shot Learning [62.090051975043544]
Attribute and object (A-O) disentanglement is a fundamental and critical problem for Compositional Zero-shot Learning (CZSL) We propose a novel A-O disentangled framework for CZSL, namely Class-specified Cascaded Network (CSCNet)
arXiv Detail & Related papers (2024-03-09T14:18:41Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning [29.591615811894265]
Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space. We propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks.
arXiv Detail & Related papers (2022-11-19T10:09:46Z)
Learning Invariant Visual Representations for Compositional Zero-Shot Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set. We propose an invariant feature learning framework to align different domains at the representation and gradient levels. Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z)
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images. Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z)
Learning Graph Embeddings for Open World Compositional Zero-Shot Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training. We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE) Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z)
Learning Graph Embeddings for Compositional Zero-shot Learning [73.80007492964951]
In compositional zero-shot learning, the goal is to recognize unseen compositions of observed visual primitives states. We propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features and latent representations of visual primitives in an end-to-end manner. By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet.
arXiv Detail & Related papers (2021-02-03T10:11:03Z)
On Learning Sets of Symmetric Elements [63.12061960528641]
This paper presents a principled approach to learning sets of general symmetric elements. We first characterize the space of linear layers that are equivariant both to element reordering and to the inherent symmetries of elements. We further show that networks that are composed of these layers, called Deep Sets for Symmetric Elements (DSS) layers, are universal approximators of both invariant and equivariant functions.
arXiv Detail & Related papers (2020-02-20T07:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.