Learning to Infer Unseen Attribute-Object Compositions
- URL: http://arxiv.org/abs/2010.14343v2
- Date: Tue, 3 Nov 2020 09:32:41 GMT
- Title: Learning to Infer Unseen Attribute-Object Compositions
- Authors: Hui Chen, Zhixiong Nan, Jingjing Jiang and Nanning Zheng
- Abstract summary: A graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions.
We build a large-scale Multi-Attribute dataset with 116,099 images and 8,030 composition categories.
- Score: 55.58107964602103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The composition recognition of unseen attribute-object is critical to make
machines learn to decompose and compose complex concepts like people. Most of
the existing methods are limited to the composition recognition of
single-attribute-object, and can hardly distinguish the compositions with
similar appearances. In this paper, a graph-based model is proposed that can
flexibly recognize both single- and multi-attribute-object compositions. The
model maps the visual features of images and the attribute-object category
labels represented by word embedding vectors into a latent space. Then,
according to the constraints of the attribute-object semantic association,
distances are calculated between visual features and the corresponding label
semantic features in the latent space. During the inference, the composition
that is closest to the given image feature among all compositions is used as
the reasoning result. In addition, we build a large-scale Multi-Attribute
Dataset (MAD) with 116,099 images and 8,030 composition categories. Experiments
on MAD and two other single-attribute-object benchmark datasets demonstrate the
effectiveness of our approach.
Related papers
- MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning [33.12021227971062]
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
arXiv Detail & Related papers (2024-06-18T16:24:48Z) - Predicting Scores of Various Aesthetic Attribute Sets by Learning from
Overall Score Labels [54.63611854474985]
In this paper, we propose to replace image attribute labels with feature extractors.
We use networks from different tasks to provide attribute features to our F2S model.
Our method makes it feasible to learn meaningful attribute scores for various aesthetic attribute sets in different types of images with only overall aesthetic scores.
arXiv Detail & Related papers (2023-12-06T01:41:49Z) - UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images [16.647573404422175]
We propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images.
UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets.
arXiv Detail & Related papers (2023-11-19T11:57:01Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Learning Conditional Attributes for Compositional Zero-Shot Learning [78.24309446833398]
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts.
One of the challenges is to model attributes interacted with different objects, e.g., the attribute wet" in wet apple" and wet cat" is different.
We argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings.
arXiv Detail & Related papers (2023-05-29T08:04:05Z) - Learning Invariant Visual Representations for Compositional Zero-Shot
Learning [30.472541551048508]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions using knowledge learned from seen-object compositions in the training set.
We propose an invariant feature learning framework to align different domains at the representation and gradient levels.
Experiments on two CZSL benchmarks demonstrate that the proposed method significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2022-06-01T11:33:33Z) - Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition.
Prior works use visual features extracted with a backbone network, pre-trained for object classification.
We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z) - Large-Scale Attribute-Object Compositions [28.97267708915054]
We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
We train our framework with images from Instagram using hashtags as noisy weak supervision.
We make careful design choices for data collection and modeling, in order to handle noisy annotations and unseen compositions.
arXiv Detail & Related papers (2021-05-24T16:05:41Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.