Learning to Predict Visual Attributes in the Wild
- URL: http://arxiv.org/abs/2106.09707v1
- Date: Thu, 17 Jun 2021 17:58:02 GMT
- Title: Learning to Predict Visual Attributes in the Wild
- Authors: Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan
Tran, Abhinav Shrivastava
- Abstract summary: We introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
We propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features.
Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art.
- Score: 43.91237738107603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual attributes constitute a large portion of information contained in a
scene. Objects can be described using a wide variety of attributes which
portray their visual appearance (color, texture), geometry (shape, size,
posture), and other intrinsic properties (state, action). Existing work is
mostly limited to study of attribute prediction in specific domains. In this
paper, we introduce a large-scale in-the-wild visual attribute prediction
dataset consisting of over 927K attribute annotations for over 260K object
instances. Formally, object attribute prediction is a multi-label
classification problem where all attributes that apply to an object must be
predicted. Our dataset poses significant challenges to existing methods due to
large number of attributes, label sparsity, data imbalance, and object
occlusion. To this end, we propose several techniques that systematically
tackle these challenges, including a base model that utilizes both low- and
high-level CNN features with multi-hop attention, reweighting and resampling
techniques, a novel negative label expansion scheme, and a novel supervised
attribute-aware contrastive learning algorithm. Using these techniques, we
achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current
state of the art. Further details about the VAW dataset can be found at
http://vawdataset.com/.
Related papers
- An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection [7.531866919805308]
We introduce the Objects365-Attr dataset, an extension of the existing Objects365 dataset, distinguished by its attribute annotations.
This dataset reduces inconsistencies in object detection by integrating a broad spectrum of attributes, including color, material, state, texture and tone.
It contains an extensive collection of 5.6M object-level attribute descriptions, meticulously annotated across 1.4M bounding boxes.
arXiv Detail & Related papers (2024-09-10T07:53:32Z) - MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning [33.12021227971062]
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
arXiv Detail & Related papers (2024-06-18T16:24:48Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Learning Concise and Descriptive Attributes for Visual Recognition [25.142065847381758]
We show that querying thousands of attributes can achieve performance competitive with image features.
We propose a novel learning-to-search method to discover those concise sets of attributes.
arXiv Detail & Related papers (2023-08-07T16:00:22Z) - Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition.
Prior works use visual features extracted with a backbone network, pre-trained for object classification.
We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z) - GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for
Multi-category Attributes Prediction [27.561424604521026]
We propose a novel attribute prediction architecture named GlideNet.
GlideNet contains three distinct feature extractors.
It can achieve compelling results on two recent and challenging datasets.
arXiv Detail & Related papers (2022-03-07T00:32:37Z) - Learning to Infer Unseen Attribute-Object Compositions [55.58107964602103]
A graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions.
We build a large-scale Multi-Attribute dataset with 116,099 images and 8,030 composition categories.
arXiv Detail & Related papers (2020-10-27T14:57:35Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z) - Joint Item Recommendation and Attribute Inference: An Adaptive Graph
Convolutional Network Approach [61.2786065744784]
In recommender systems, users and items are associated with attributes, and users show preferences to items.
As annotating user (item) attributes is a labor intensive task, the attribute values are often incomplete with many missing attribute values.
We propose an Adaptive Graph Convolutional Network (AGCN) approach for joint item recommendation and attribute inference.
arXiv Detail & Related papers (2020-05-25T10:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.