Related papers: Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

URL: http://arxiv.org/abs/2211.02895v1
Date: Sat, 5 Nov 2022 12:57:06 GMT
Title: Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning
Authors: Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, Xiaojun Wu, and Yi Yang
Abstract summary: The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. In Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions.
Score: 86.5258816031722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The task of Compositional Zero-Shot Learning (CZSL) is to recognize images of novel state-object compositions that are absent during the training stage. Previous methods of learning compositional embedding have shown effectiveness in closed-world CZSL. However, in Open-World CZSL (OW-CZSL), their performance tends to degrade significantly due to the large cardinality of possible compositions. Some recent works separately predict simple primitives (i.e., states and objects) to reduce cardinality. However, they consider simple primitives as independent probability distributions, ignoring the heavy dependence between states, objects, and compositions. In this paper, we model the dependence of compositions via feasibility and contextuality. Feasibility-dependence refers to the unequal feasibility relations between simple primitives, e.g., \textit{hairy} is more feasible with \textit{cat} than with \textit{building} in the real world. Contextuality-dependence represents the contextual variance in images, e.g., \textit{cat} shows diverse appearances under the state of \textit{dry} and \textit{wet}. We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively. SA captures semantics in compositions to alleviate impossible predictions, driven by the visual similarity between simple primitives. KD disentangles images into unbiased feature representations, easing contextual bias in predictions. Moreover, we complement the current compositional probability model with feasibility and contextuality in a compatible format. Finally, we conduct comprehensive experiments to analyze and validate the superior or competitive performance of our model, Semantic Attention and knowledge Disentanglement guided Simple Primitives (SAD-SP), on three widely-used benchmark OW-CZSL datasets.

Related papers

A New Method to Capturing Compositional Knowledge in Linguistic Space [0.0]
ZS-CU is a novel task that enhances compositional understanding without requiring hard negative training data. We propose YUKINO, which uses textual inversion to map unlabeled images to pseudo-tokens in a pre-trained CLIP model. YUKINO outperforms the existing multi-modal SOTA models by over 8% on the SugarCREPE benchmark.
arXiv Detail & Related papers (2024-12-20T07:48:09Z)
ComAlign: Compositional Alignment in Vision-Language Models [2.3250871476216814]
We introduce Compositional Alignment (ComAlign) to discover more exact correspondence of text and image components. Our methodology emphasizes that the compositional structure extracted from the text modality must also be retained in the image modality. We train a lightweight network lying on top of existing visual and language encoders using a small dataset.
arXiv Detail & Related papers (2024-09-12T16:46:41Z)
Attention Based Simple Primitives for Open World Compositional Zero-Shot Learning [12.558701595138928]
Compositional Zero-Shot Learning (CZSL) aims to predict unknown compositions made up of attribute and object pairs. We are exploring Open World Compositional Zero-Shot Learning (OW-CZSL) in this study, where our test space encompasses all potential combinations of attributes and objects. Our approach involves utilizing the self-attention mechanism between attributes and objects to achieve better generalization from seen to unseen compositions.
arXiv Detail & Related papers (2024-07-18T17:11:29Z)
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object) We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z)
ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning [29.591615811894265]
Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space. We propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks.
arXiv Detail & Related papers (2022-11-19T10:09:46Z)
Learning Attention Propagation for Compositional Zero-Shot Learning [71.55375561183523]
We propose a novel method called Compositional Attention Propagated Embedding (CAPE) CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions. We show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
arXiv Detail & Related papers (2022-10-20T19:44:11Z)
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images. Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently. We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space. Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z)
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis [52.341186561026724]
Lacking compositionality could have severe implications for robustness and fairness. We introduce a new framework, StyleT2I, to improve the compositionality of text-to-image synthesis. Results show that StyleT2I outperforms previous approaches in terms of consistency between the input text and synthesized images.
arXiv Detail & Related papers (2022-03-29T17:59:50Z)
Learning Graph Embeddings for Open World Compositional Zero-Shot Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training. We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE) Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.