Learning to Discover Multi-Class Attentional Regions for Multi-Label
Image Recognition
- URL: http://arxiv.org/abs/2007.01755v3
- Date: Wed, 9 Jun 2021 08:27:59 GMT
- Title: Learning to Discover Multi-Class Attentional Regions for Multi-Label
Image Recognition
- Authors: Bin-Bin Gao, Hong-Yu Zhou
- Abstract summary: We propose a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions.
To bridge the gap between global and local streams, we propose a multi-class attentional region module.
Our method can efficiently and effectively recognize multi-class objects with an affordable computation cost and a parameter-free region localization module.
- Score: 20.2935275611948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label image recognition is a practical and challenging task compared to
single-label image classification. However, previous works may be suboptimal
because of a great number of object proposals or complex attentional region
generation modules. In this paper, we propose a simple but efficient two-stream
framework to recognize multi-category objects from global image to local
regions, similar to how human beings perceive objects. To bridge the gap
between global and local streams, we propose a multi-class attentional region
module which aims to make the number of attentional regions as small as
possible and keep the diversity of these regions as high as possible. Our
method can efficiently and effectively recognize multi-class objects with an
affordable computation cost and a parameter-free region localization module.
Over three benchmarks on multi-label image classification, we create new
state-of-the-art results with a single model only using image semantics without
label dependency. In addition, the effectiveness of the proposed method is
extensively demonstrated under different factors such as global pooling
strategy, input size and network architecture. Code has been made available
at~\url{https://github.com/gaobb/MCAR}.
Related papers
- R-MAE: Regions Meet Masked Autoencoders [113.73147144125385]
We explore regions as a potential visual analogue of words for self-supervised image representation learning.
Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - Facing the Void: Overcoming Missing Data in Multi-View Imagery [0.783788180051711]
We propose a novel technique for multi-view image classification robust to this problem.
The proposed method, based on state-of-the-art deep learning-based approaches and metric learning, can be easily adapted and exploited in other applications and domains.
Results show that the proposed algorithm provides improvements in multi-view image classification accuracy when compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T13:21:27Z) - Diverse Instance Discovery: Vision-Transformer for Instance-Aware
Multi-Label Image Recognition [24.406654146411682]
Vision Transformer (ViT) is the research base for this paper.
Our goal is to leverage ViT's patch tokens and self-attention mechanism to mine rich instances in multi-label images.
We propose a weakly supervised object localization-based approach to extract multi-scale local features.
arXiv Detail & Related papers (2022-04-22T14:38:40Z) - SATS: Self-Attention Transfer for Continual Semantic Segmentation [50.51525791240729]
continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning.
This study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements within each image.
The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model.
arXiv Detail & Related papers (2022-03-15T06:09:28Z) - Local and Global GANs with Semantic-Aware Upsampling for Image
Generation [201.39323496042527]
We consider generating images using local context.
We propose a class-specific generative network using semantic maps as guidance.
Lastly, we propose a novel semantic-aware upsampling method.
arXiv Detail & Related papers (2022-02-28T19:24:25Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - MFNet: Multi-class Few-shot Segmentation Network with Pixel-wise Metric
Learning [34.059257121606336]
This work focuses on few-shot semantic segmentation, which is still a largely unexplored field.
We first present a novel multi-way encoding and decoding architecture which effectively fuses multi-scale query information and multi-class support information into one query-support embedding.
Experiments on standard benchmarks PASCAL-5i and COCO-20i show clear benefits of our method over the state of the art in few-shot segmentation.
arXiv Detail & Related papers (2021-10-30T11:37:36Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.