Discriminative Region-based Multi-Label Zero-Shot Learning
- URL: http://arxiv.org/abs/2108.09301v1
- Date: Fri, 20 Aug 2021 17:56:47 GMT
- Title: Discriminative Region-based Multi-Label Zero-Shot Learning
- Authors: Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling
Shao, Mubarak Shah
- Abstract summary: Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
- Score: 145.0952336375342
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-label zero-shot learning (ZSL) is a more realistic counter-part of
standard single-label ZSL since several objects can co-exist in a natural
image. However, the occurrence of multiple objects complicates the reasoning
and requires region-specific processing of visual features to preserve their
contextual cues. We note that the best existing multi-label ZSL method takes a
shared approach towards attending to region features with a common set of
attention maps for all the classes. Such shared maps lead to diffused
attention, which does not discriminatively focus on relevant locations when the
number of classes are large. Moreover, mapping spatially-pooled visual features
to the class semantics leads to inter-class feature entanglement, thus
hampering the classification. Here, we propose an alternate approach towards
region-based discriminability-preserving multi-label zero-shot classification.
Our approach maintains the spatial resolution to preserve region-level
characteristics and utilizes a bi-level attention module (BiAM) to enrich the
features by incorporating both region and scene context information. The
enriched region-level features are then mapped to the class semantics and only
their class predictions are spatially pooled to obtain image-level predictions,
thereby keeping the multi-class features disentangled. Our approach sets a new
state of the art on two large-scale multi-label zero-shot benchmarks: NUS-WIDE
and Open Images. On NUS-WIDE, our approach achieves an absolute gain of 6.9%
mAP for ZSL, compared to the best published results.
Related papers
- `Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning [47.1040786932317]
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
We take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features.
A learned function maps the region features to region-specific attribute attention used to construct class part prototypes.
arXiv Detail & Related papers (2024-04-12T18:37:00Z) - Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot
Learning [24.075034737719776]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
We propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection.
Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
arXiv Detail & Related papers (2023-09-02T12:07:21Z) - Region Semantically Aligned Network for Zero-Shot Learning [18.18665627472823]
We propose a Region Semantically Aligned Network (RSAN) which maps local features of unseen classes to their semantic attributes.
We obtain each attribute from a specific region of the output and exploit these attributes for recognition.
Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2021-10-14T03:23:40Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - Isometric Propagation Network for Generalized Zero-shot Learning [72.02404519815663]
A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data.
We propose Isometric propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces.
IPN achieves state-of-the-art performance on three popular Zero-shot learning benchmarks.
arXiv Detail & Related papers (2021-02-03T12:45:38Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.