Relation-aware Compositional Zero-shot Learning for Attribute-Object
Pair Recognition
- URL: http://arxiv.org/abs/2108.04603v1
- Date: Tue, 10 Aug 2021 11:23:03 GMT
- Title: Relation-aware Compositional Zero-shot Learning for Attribute-Object
Pair Recognition
- Authors: Ziwei Xu, Guangzhi Wang, Yongkang Wong, Mohan Kankanhalli
- Abstract summary: This paper proposes a novel model for recognizing images with composite attribute-object concepts.
We aim to explore the three key properties required to learn rich and robust features for primitive concepts that compose attribute-object pairs.
To prevent the model from being biased towards seen composite concepts and reduce the entanglement between attributes and objects, we propose a blocking mechanism.
- Score: 17.464548471883948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel model for recognizing images with composite
attribute-object concepts, notably for composite concepts that are unseen
during model training. We aim to explore the three key properties required by
the task --- relation-aware, consistent, and decoupled --- to learn rich and
robust features for primitive concepts that compose attribute-object pairs. To
this end, we propose the Blocked Message Passing Network (BMP-Net). The model
consists of two modules. The concept module generates semantically meaningful
features for primitive concepts, whereas the visual module extracts visual
features for attributes and objects from input images. A message passing
mechanism is used in the concept module to capture the relations between
primitive concepts. Furthermore, to prevent the model from being biased towards
seen composite concepts and reduce the entanglement between attributes and
objects, we propose a blocking mechanism that equalizes the information
available to the model for both seen and unseen concepts. Extensive experiments
and ablation studies on two benchmarks show the efficacy of the proposed model.
Related papers
- Neural Concept Binder [22.074896812195437]
We introduce the Neural Concept Binder (NCB), a framework for deriving both discrete and continuous concept representations.
The structured nature of NCB's concept representations allows for intuitive inspection and the straightforward integration of external knowledge.
We validate the effectiveness of NCB through evaluations on our newly introduced CLEVR-Sudoku dataset.
arXiv Detail & Related papers (2024-06-14T11:52:09Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - Create Your World: Lifelong Text-to-Image Diffusion [75.14353789007902]
We propose Lifelong text-to-image Diffusion Model (L2DM) to overcome knowledge "catastrophic forgetting" for the past encountered concepts.
In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module.
Our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-09-08T16:45:56Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Translational Concept Embedding for Generalized Compositional Zero-shot
Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion.
This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - Interpretable Visual Reasoning via Induced Symbolic Space [75.95241948390472]
We study the problem of concept induction in visual reasoning, i.e., identifying concepts and their hierarchical relationships from question-answer pairs associated with images.
We first design a new framework named object-centric compositional attention model (OCCAM) to perform the visual reasoning task with object-level visual features.
We then come up with a method to induce concepts of objects and relations using clues from the attention patterns between objects' visual features and question words.
arXiv Detail & Related papers (2020-11-23T18:21:49Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.