Simple Primitives with Feasibility- and Contextuality-Dependence for
Open-World Compositional Zero-shot Learning
- URL: http://arxiv.org/abs/2211.02895v1
- Date: Sat, 5 Nov 2022 12:57:06 GMT
- Title: Simple Primitives with Feasibility- and Contextuality-Dependence for
Open-World Compositional Zero-shot Learning
- Authors: Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, Xiaojun Wu, and Yi
Yang
- Abstract summary: The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage.
Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL.
In Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions.
- Score: 86.5258816031722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of
novel state-object compositions that are absent during the training stage.
Previous methods of learning compositional embedding have shown effectiveness
in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance
tends to degrade significantly due to the large cardinality of possible
compositions. Some recent works separately predict simple primitives (i.e.,
states and objects) to reduce cardinality. However, they consider simple
primitives as independent probability distributions, ignoring the heavy
dependence between states, objects, and compositions. In this paper, we model
the dependence of compositions via feasibility and contextuality.
Feasibility-dependence refers to the unequal feasibility relations between
simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than
with \textit{building} in the real world. Contextuality-dependence represents
the contextual variance in images, e.g., \textit{cat} shows diverse appearances
under the state of \textit{dry} and \textit{wet}. We design Semantic Attention
(SA) and generative Knowledge Disentanglement (KD) to learn the dependence of
feasibility and contextuality, respectively. SA captures semantics in
compositions to alleviate impossible predictions, driven by the visual
similarity between simple primitives. KD disentangles images into unbiased
feature representations, easing contextual bias in predictions. Moreover, we
complement the current compositional probability model with feasibility and
contextuality in a compatible format. Finally, we conduct comprehensive
experiments to analyze and validate the superior or competitive performance of
our model, Semantic Attention and knowledge Disentanglement guided Simple
Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets.
Related papers
- ComAlign: Compositional Alignment in Vision-Language Models [2.3250871476216814]
We introduce Compositional Alignment (ComAlign) to discover more exact correspondence of text and image components.
Our methodology emphasizes that the compositional structure extracted from the text modality must also be retained in the image modality.
We train a lightweight network lying on top of existing visual and language encoders using a small dataset.
arXiv Detail & Related papers (2024-09-12T16:46:41Z) - Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning [12.558701595138928]
Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs.
We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects.
Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions.
arXiv Detail & Related papers (2024-07-18T17:11:29Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - ProCC: Progressive Cross-primitive Compatibility for Open-World
Compositional Zero-Shot Learning [29.591615811894265]
Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space.
We propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks.
arXiv Detail & Related papers (2022-11-19T10:09:46Z) - Learning Attention Propagation for Compositional Zero-Shot Learning [71.55375561183523]
We propose a novel method called Compositional Attention Propagated Embedding (CAPE)
CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.
We show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
arXiv Detail & Related papers (2022-10-20T19:44:11Z) - KG-SP: Knowledge Guided Simple Primitives for Open World Compositional
Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images.
Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently.
We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space.
Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z) - StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [52.341186561026724]
Lacking compositionality could have severe implications for robustness and fairness.
We introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis.
Results show that StyleT2I outperforms previous approaches in terms of consistency between the input text and synthesized images.
arXiv Detail & Related papers (2022-03-29T17:59:50Z) - Learning Graph Embeddings for Open World Compositional Zero-Shot
Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training.
We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE)
Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.