Related papers: Learning Graph Embeddings for Compositional Zero-shot Learning

Learning Graph Embeddings for Compositional Zero-shot Learning

URL: http://arxiv.org/abs/2102.01987v1
Date: Wed, 3 Feb 2021 10:11:03 GMT
Title: Learning Graph Embeddings for Compositional Zero-shot Learning
Authors: Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, Zeynep Akata
Abstract summary: In compositional zero-shot learning, the goal is to recognize unseen compositions of observed visual primitives states. We propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features and latent representations of visual primitives in an end-to-end manner. By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet.
Score: 73.80007492964951
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In compositional zero-shot learning, the goal is to recognize unseen compositions (e.g. old dog) of observed visual primitives states (e.g. old, cute) and objects (e.g. car, dog) in the training set. This is challenging because the same state can for example alter the visual appearance of a dog drastically differently from a car. As a solution, we propose a novel graph formulation called Compositional Graph Embedding (CGE) that learns image features, compositional classifiers, and latent representations of visual primitives in an end-to-end manner. The key to our approach is exploiting the dependency between states, objects, and their compositions within a graph structure to enforce the relevant knowledge transfer from seen to unseen compositions. By learning a joint compatibility that encodes semantics between concepts, our model allows for generalization to unseen compositions without relying on an external knowledge base like WordNet. We show that in the challenging generalized compositional zero-shot setting our CGE significantly outperforms the state of the art on MIT-States and UT-Zappos. We also propose a new benchmark for this task based on the recent GQA dataset.

Related papers

Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [54.08741382593959]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)<n>It is challenging to learn disentangled primitive features that are general across different compositions.<n>We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs.
arXiv Detail & Related papers (2024-08-19T08:23:09Z)
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning. Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z)
Learning Attention Propagation for Compositional Zero-Shot Learning [71.55375561183523]
We propose a novel method called Compositional Attention Propagated Embedding (CAPE) CAPE learns to identify this structure and propagates knowledge between them to learn class embedding for all seen and unseen compositions. We show that our method outperforms previous baselines to set a new state-of-the-art on three publicly available benchmarks.
arXiv Detail & Related papers (2022-10-20T19:44:11Z)
On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning [3.9348884623092517]
We learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts.
arXiv Detail & Related papers (2022-04-23T13:30:08Z)
One-shot Scene Graph Generation [130.57405850346836]
We propose Multiple Structured Knowledge (Relational Knowledgesense Knowledge) for the one-shot scene graph generation task. Our method significantly outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-02-22T11:32:59Z)
Learning Graph Embeddings for Open World Compositional Zero-Shot Learning [47.09665742252187]
Compositional Zero-Shot learning (CZSL) aims to recognize unseen compositions of state and object visual primitives seen during training. We propose a new approach, Compositional Cosine Graph Embeddings (Co-CGE) Co-CGE models the dependency between states, objects and their compositions through a graph convolutional neural network.
arXiv Detail & Related papers (2021-05-03T17:08:21Z)
Generative Compositional Augmentations for Scene Graph Prediction [27.535630110794855]
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. We propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs.
arXiv Detail & Related papers (2020-07-11T12:11:53Z)
A causal view of compositional zero-shot recognition [42.63916938252048]
People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language. Here we describe an approach for compositional generalization that builds on causal ideas.
arXiv Detail & Related papers (2020-06-25T17:51:22Z)
Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them. Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.