Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning
- URL: http://arxiv.org/abs/2509.12711v1
- Date: Tue, 16 Sep 2025 06:05:31 GMT
- Title: Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning
- Authors: Haozhe Zhang, Chenchen Jing, Mingyu Liu, Qingsheng Wang, Hao Chen,
- Abstract summary: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by learning prior knowledge of seen primitives.<n>We propose a novel approach called Debiased Feature Augmentation (DeFA) to address these challenges.
- Score: 23.380192229142924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize unseen attribute-object compositions by learning prior knowledge of seen primitives, \textit{i.e.}, attributes and objects. Learning generalizable compositional representations in CZSL remains challenging due to the entangled nature of attributes and objects as well as the prevalence of long-tailed distributions in real-world data. Inspired by neuroscientific findings that imagination and perception share similar neural processes, we propose a novel approach called Debiased Feature Augmentation (DeFA) to address these challenges. The proposed DeFA integrates a disentangle-and-reconstruct framework for feature augmentation with a debiasing strategy. DeFA explicitly leverages the prior knowledge of seen attributes and objects by synthesizing high-fidelity composition features to support compositional generalization. Extensive experiments on three widely used datasets demonstrate that DeFA achieves state-of-the-art performance in both \textit{closed-world} and \textit{open-world} settings.
Related papers
- Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning [86.58227205147546]
The goal of Compositional Zero-Shot Learning (OV-CZSL) is to recognize iteration-object compositions in the open-vocabulary setting.<n>We propose Structure-aware Prompt Adaptation (SPA) method, which enables models to generalize from seen to unseen attributes and objects.
arXiv Detail & Related papers (2026-03-04T07:54:28Z) - A Conditional Probability Framework for Compositional Zero-shot Learning [86.86063926727489]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of known objects and attributes by leveraging knowledge from previously seen compositions.<n>Traditional approaches primarily focus on disentangling attributes and objects, treating them as independent entities during learning.<n>We adopt a Conditional Probability Framework (CPF) to explicitly model attribute-object dependencies.
arXiv Detail & Related papers (2025-07-23T10:20:52Z) - Learning Primitive Relations for Compositional Zero-Shot Learning [26.35330980336384]
We propose a novel framework, learning primitive relations (LPR), designed to probabilistically capture the relationships between states and objects.<n>LPR considers the dependencies between states and objects, enabling the model to infer the likelihood of unseen compositions.
arXiv Detail & Related papers (2025-01-24T08:10:05Z) - Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning [21.488599805772054]
Compositional zero-shot learning aims to recognize novel compositions of attributes and objects learned from seen compositions.<n>Previous works disentangle attributes and objects by extracting shared and exclusive parts between the image pair sharing the same attribute (object)<n>We propose a novel framework named multimodal large language model (MLLM) embeddings and attribute smoothing guided disentanglement for CZSL.
arXiv Detail & Related papers (2024-11-18T07:55:54Z) - Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [54.08741382593959]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)<n>It is challenging to learn disentangled primitive features that are general across different compositions.<n>We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs.
arXiv Detail & Related papers (2024-08-19T08:23:09Z) - Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning [12.558701595138928]
Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs.
We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects.
Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions.
arXiv Detail & Related papers (2024-07-18T17:11:29Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Distilled Reverse Attention Network for Open-world Compositional
Zero-Shot Learning [42.138756191997295]
Open-World Compositional Zero-Shot Learning (OW-CZSL) aims to recognize new compositions of seen attributes and objects.
OW-CZSL methods built on the conventional closed-world setting degrade severely due to the unconstrained OW test space.
We propose a novel Distilled Reverse Attention Network to address the challenges.
arXiv Detail & Related papers (2023-03-01T10:52:20Z) - Learning Invariant Visual Representations for Compositional Zero-Shot
Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set.
We propose an invariant feature learning framework to align different domains at the representation and gradient levels.
Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z) - KG-SP: Knowledge Guided Simple Primitives for Open World Compositional
Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images.
Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently.
We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space.
Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z) - Learning the Compositional Visual Coherence for Complementary
Recommendations [62.60648815930101]
Complementary recommendations aim at providing users product suggestions that are supplementary and compatible with their obtained items.
We propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents.
arXiv Detail & Related papers (2020-06-08T06:57:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.