GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for
Multi-category Attributes Prediction
- URL: http://arxiv.org/abs/2203.03079v1
- Date: Mon, 7 Mar 2022 00:32:37 GMT
- Title: GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for
Multi-category Attributes Prediction
- Authors: Kareem Metwaly, Aerin Kim, Elliot Branson and Vishal Monga
- Abstract summary: We propose a novel attribute prediction architecture named GlideNet.
GlideNet contains three distinct feature extractors.
It can achieve compelling results on two recent and challenging datasets.
- Score: 27.561424604521026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attaching attributes (such as color, shape, state, action) to object
categories is an important computer vision problem. Attribute prediction has
seen exciting recent progress and is often formulated as a multi-label
classification problem. Yet significant challenges remain in: 1) predicting
diverse attributes over multiple categories, 2) modeling attributes-category
dependency, 3) capturing both global and local scene context, and 4) predicting
attributes of objects with low pixel-count. To address these issues, we propose
a novel multi-category attribute prediction deep architecture named GlideNet,
which contains three distinct feature extractors. A global feature extractor
recognizes what objects are present in a scene, whereas a local one focuses on
the area surrounding the object of interest. Meanwhile, an intrinsic feature
extractor uses an extension of standard convolution dubbed Informed Convolution
to retrieve features of objects with low pixel-count. GlideNet uses gating
mechanisms with binary masks and its self-learned category embedding to combine
the dense embeddings. Collectively, the Global-Local-Intrinsic blocks
comprehend the scene's global context while attending to the characteristics of
the local object of interest. Finally, using the combined features, an
interpreter predicts the attributes, and the length of the output is determined
by the category, thereby removing unnecessary attributes. GlideNet can achieve
compelling results on two recent and challenging datasets -- VAW and CAR -- for
large-scale attribute prediction. For instance, it obtains more than 5\% gain
over state of the art in the mean recall (mR) metric. GlideNet's advantages are
especially apparent when predicting attributes of objects with low pixel counts
as well as attributes that demand global context understanding. Finally, we
show that GlideNet excels in training starved real-world scenarios.
Related papers
- Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - CAR -- Cityscapes Attributes Recognition A Multi-category Attributes
Dataset for Autonomous Vehicles [30.024877502540665]
We present a new dataset for attributes recognition -- Cityscapes Attributes Recognition (CAR)
The new dataset extends the well-known dataset Cityscapes by adding an additional yet important annotation layer of attributes of objects in each image.
The dataset has a structured and tailored taxonomy where each category has its own set of possible attributes.
arXiv Detail & Related papers (2021-11-16T06:00:43Z) - Improving Object Detection and Attribute Recognition by Feature
Entanglement Reduction [26.20319853343761]
We show that object detection should be attribute-independent and attributes be largely object-independent.
We disentangle them by the use of a two-stream model where the category and attribute features are computed independently but the classification heads share Regions of Interest (RoIs)
Compared with a traditional single-stream model, our model shows significant improvements over VG-20, a subset of Visual Genome, on both supervised and attribute transfer tasks.
arXiv Detail & Related papers (2021-08-25T22:27:06Z) - Learning to Predict Visual Attributes in the Wild [43.91237738107603]
We introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
We propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features.
Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art.
arXiv Detail & Related papers (2021-06-17T17:58:02Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z) - CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language
Learning [78.3857991931479]
We present GROLLA, an evaluation framework for Grounded Language Learning with Attributes.
We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations.
arXiv Detail & Related papers (2020-06-03T11:21:42Z) - Learning to Predict Context-adaptive Convolution for Semantic
Segmentation [66.27139797427147]
Long-range contextual information is essential for achieving high-performance semantic segmentation.
We propose a Context-adaptive Convolution Network (CaC-Net) to predict a spatially-varying feature weighting vector.
Our CaC-Net achieves superior segmentation performance on three public datasets.
arXiv Detail & Related papers (2020-04-17T13:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.