On Leveraging Variational Graph Embeddings for Open World Compositional
Zero-Shot Learning
- URL: http://arxiv.org/abs/2204.11848v1
- Date: Sat, 23 Apr 2022 13:30:08 GMT
- Title: On Leveraging Variational Graph Embeddings for Open World Compositional
Zero-Shot Learning
- Authors: Muhammad Umer Anwaar, Zhihui Pan, Martin Kleinsteuber
- Abstract summary: We learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified.
We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts.
- Score: 3.9348884623092517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are able to identify and categorize novel compositions of known
concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn
composition of primitive concepts, i.e. objects and states, in such a way that
even their novel compositions can be zero-shot classified. In this work, we do
not assume any prior knowledge on the feasibility of novel compositions
i.e.open-world setting, where infeasible compositions dominate the search
space. We propose a Compositional Variational Graph Autoencoder (CVGAE)
approach for learning the variational embeddings of the primitive concepts
(nodes) as well as feasibility of their compositions (via edges). Such
modelling makes CVGAE scalable to real-world application scenarios. This is in
contrast to SOTA method, CGE, which is computationally very expensive. e.g.for
benchmark C-GQA dataset, CGE requires 3.94 x 10^5 nodes, whereas CVGAE requires
only 1323 nodes. We learn a mapping of the graph and image embeddings onto a
common embedding space. CVGAE adopts a deep metric learning approach and learns
a similarity metric in this space via bi-directional contrastive loss between
projected graph and image embeddings. We validate the effectiveness of our
approach on three benchmark datasets.We also demonstrate via an image retrieval
task that the representations learnt by CVGAE are better suited for
compositional generalization.
Related papers
- Visual Representation Learning Guided By Multi-modal Prior Knowledge [29.954639194410586]
We propose Knowledge-Guided Visual representation learning (KGV) to improve generalization under distribution shift.
We use prior knowledge from two distinct modalities: 1) a knowledge graph (KG) with hierarchical and association relationships; and 2) generated synthetic images of visual elements semantically represented in the KG.
KGV consistently exhibits higher accuracy and data efficiency than the baselines across all experiments.
arXiv Detail & Related papers (2024-10-21T13:06:38Z) - Transformer-based Image Generation from Scene Graphs [11.443097632746763]
Graph-structured scene descriptions can be efficiently used in generative models to control the composition of the generated image.
Previous approaches are based on the combination of graph convolutional networks and adversarial methods for layout prediction and image generation.
We show how employing multi-head attention to encode the graph information can improve the quality of the sampled data.
arXiv Detail & Related papers (2023-03-08T14:54:51Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - GraphCoCo: Graph Complementary Contrastive Learning [65.89743197355722]
Graph Contrastive Learning (GCL) has shown promising performance in graph representation learning (GRL) without the supervision of manual annotations.
This paper proposes an effective graph complementary contrastive learning approach named GraphCoCo to tackle the above issue.
arXiv Detail & Related papers (2022-03-24T02:58:36Z) - Representing Videos as Discriminative Sub-graphs for Action Recognition [165.54738402505194]
We introduce a new design of sub-graphs to represent and encode theriminative patterns of each action in the videos.
We present MUlti-scale Sub-Earn Ling (MUSLE) framework that novelly builds space-time graphs and clusters into compact sub-graphs on each scale.
arXiv Detail & Related papers (2022-01-11T16:15:25Z) - Learning Graph Embeddings for Open World Compositional Zero-Shot
Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training.
We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE)
Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Learning Graph Embeddings for Compositional Zero-shot Learning [73.80007492964951]
In compositional zero-shot learning, the goal is to recognize unseen compositions of observed visual primitives states.
We propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features and latent representations of visual primitives in an end-to-end manner.
By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet.
arXiv Detail & Related papers (2021-02-03T10:11:03Z) - Zero-Shot Learning with Common Sense Knowledge Graphs [10.721717005752405]
We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space.
We introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations.
Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
arXiv Detail & Related papers (2020-06-18T17:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.