Learning Graph Embeddings for Compositional Zero-shot Learning
- URL: http://arxiv.org/abs/2102.01987v1
- Date: Wed, 3 Feb 2021 10:11:03 GMT
- Title: Learning Graph Embeddings for Compositional Zero-shot Learning
- Authors: Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata
- Abstract summary: In compositional zero-shot learning, the goal is to recognize unseen compositions of observed visual primitives states.
We propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features and latent representations of visual primitives in an end-to-end manner.
By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet.
- Score: 73.80007492964951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In compositional zero-shot learning, the goal is to recognize unseen
compositions (e.g. old dog) of observed visual primitives states (e.g. old,
cute) and objects (e.g. car, dog) in the training set. This is challenging
because the same state can for example alter the visual appearance of a dog
drastically differently from a car. As a solution, we propose a novel graph
formulation called Compositional Graph Embedding (CGE) that learns image
features, compositional classifiers, and latent representations of visual
primitives in an end-to-end manner. The key to our approach is exploiting the
dependency between states, objects, and their compositions within a graph
structure to enforce the relevant knowledge transfer from seen to unseen
compositions. By learning a joint compatibility that encodes semantics between
concepts, our model allows for generalization to unseen compositions without
relying on an external knowledge base like WordNet. We show that in the
challenging generalized compositional zero-shot setting our CGE significantly
outperforms the state of the art on MIT-States and UT-Zappos. We also propose a
new benchmark for this task based on the recent GQA dataset.
Related papers
- Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Learning Attention Propagation for Compositional Zero-Shot Learning [71.55375561183523]
We propose a novel method called Compositional Attention Propagated Embedding (CAPE)
CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions.
We show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
arXiv Detail & Related papers (2022-10-20T19:44:11Z) - On Leveraging Variational Graph Embeddings for Open World Compositional
Zero-Shot Learning [3.9348884623092517]
We learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified.
We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts.
arXiv Detail & Related papers (2022-04-23T13:30:08Z) - One-shot Scene Graph Generation [130.57405850346836]
We propose Multiple Structured Knowledge (Relational Knowledgesense Knowledge) for the one-shot scene graph generation task.
Our method significantly outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-02-22T11:32:59Z) - Learning Graph Embeddings for Open World Compositional Zero-Shot
Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training.
We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE)
Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z) - Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language.
We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution.
We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z) - A causal view of compositional zero-shot recognition [42.63916938252048]
People easily recognize new visual categories that are new combinations of known components.
This compositional generalization capacity is critical for learning in real-world domains like vision and language.
Here we describe an approach for compositional generalization that builds on causal ideas.
arXiv Detail & Related papers (2020-06-25T17:51:22Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.