Large-Scale Attribute-Object Compositions
- URL: http://arxiv.org/abs/2105.11373v1
- Date: Mon, 24 May 2021 16:05:41 GMT
- Title: Large-Scale Attribute-Object Compositions
- Authors: Filip Radenovic, Animesh Sinha, Albert Gordo, Tamara Berg, Dhruv
Mahajan
- Abstract summary: We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data.
We train our framework with images from Instagram using hashtags as noisy weak supervision.
We make careful design choices for data collection and modeling, in order to handle noisy annotations and unseen compositions.
- Score: 28.97267708915054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of learning how to predict attribute-object compositions
from images, and its generalization to unseen compositions missing from the
training data. To the best of our knowledge, this is a first large-scale study
of this problem, involving hundreds of thousands of compositions. We train our
framework with images from Instagram using hashtags as noisy weak supervision.
We make careful design choices for data collection and modeling, in order to
handle noisy annotations and unseen compositions. Finally, extensive
evaluations show that learning to compose classifiers outperforms late fusion
of individual attribute and object predictions, especially in the case of
unseen attribute-object pairs.
Related papers
- Cross-composition Feature Disentanglement for Compositional Zero-shot Learning [49.919635694894204]
Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL)
We propose the solution of cross-composition feature disentanglement, which takes multiple primitive-sharing compositions as inputs and constrains the disentangled primitive features to be general across these compositions.
arXiv Detail & Related papers (2024-08-19T08:23:09Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Learning to Annotate Part Segmentation with Gradient Matching [58.100715754135685]
This paper focuses on tackling semi-supervised part segmentation tasks by generating high-quality images with a pre-trained GAN.
In particular, we formulate the annotator learning as a learning-to-learn problem.
We show that our method can learn annotators from a broad range of labelled images including real images, generated images, and even analytically rendered images.
arXiv Detail & Related papers (2022-11-06T01:29:22Z) - AMICO: Amodal Instance Composition [40.03865667370814]
Image composition aims to blend multiple objects to form a harmonized image.
We present Amodal Instance Composition for blending imperfect objects onto a target image.
Our results show state-of-the-art performance on public COCOA and KINS benchmarks.
arXiv Detail & Related papers (2022-10-11T23:23:14Z) - Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition.
Prior works use visual features extracted with a backbone network, pre-trained for object classification.
We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z) - PartImageNet: A Large, High-Quality Dataset of Parts [16.730418538593703]
We propose PartImageNet, a high-quality dataset with part segmentation annotations.
PartImageNet is unique because it offers part-level annotations on a general set of classes with non-rigid, articulated objects.
It can be utilized in multiple vision tasks including but not limited to: Part Discovery, Few-shot Learning.
arXiv Detail & Related papers (2021-12-02T02:12:03Z) - Learning to Infer Unseen Attribute-Object Compositions [55.58107964602103]
A graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions.
We build a large-scale Multi-Attribute dataset with 116,099 images and 8,030 composition categories.
arXiv Detail & Related papers (2020-10-27T14:57:35Z) - A causal view of compositional zero-shot recognition [42.63916938252048]
People easily recognize new visual categories that are new combinations of known components.
This compositional generalization capacity is critical for learning in real-world domains like vision and language.
Here we describe an approach for compositional generalization that builds on causal ideas.
arXiv Detail & Related papers (2020-06-25T17:51:22Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.