Residual Attention: A Simple but Effective Method for Multi-Label
Recognition
- URL: http://arxiv.org/abs/2108.02456v1
- Date: Thu, 5 Aug 2021 08:45:57 GMT
- Title: Residual Attention: A Simple but Effective Method for Multi-Label
Recognition
- Authors: Ke Zhu, Jianxin Wu
- Abstract summary: We propose an embarrassingly simple module, named class-specific residual attention (CSRA)
CSRA generates class-specific features for every category by proposing a simple spatial attention score, and then combines it with the class-agnostic average pooling feature.
With only 4 lines of code, CSRA also leads to consistent improvement across many diverse pretrained models and datasets without any extra training.
- Score: 29.18904701720024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label image recognition is a challenging computer vision task of
practical use. Progresses in this area, however, are often characterized by
complicated methods, heavy computations, and lack of intuitive explanations. To
effectively capture different spatial regions occupied by objects from
different categories, we propose an embarrassingly simple module, named
class-specific residual attention (CSRA). CSRA generates class-specific
features for every category by proposing a simple spatial attention score, and
then combines it with the class-agnostic average pooling feature. CSRA achieves
state-of-the-art results on multilabel recognition, and at the same time is
much simpler than them. Furthermore, with only 4 lines of code, CSRA also leads
to consistent improvement across many diverse pretrained models and datasets
without any extra training. CSRA is both easy to implement and light in
computations, which also enjoys intuitive explanations and visualizations.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Two-Step Active Learning for Instance Segmentation with Uncertainty and
Diversity Sampling [20.982992381790034]
We propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling.
Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets.
arXiv Detail & Related papers (2023-09-28T03:40:30Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Few-Shot Learning by Integrating Spatial and Frequency Representation [25.11147383752403]
We propose to integrate the frequency information into the learning model to boost the discrimination ability of the system.
We employ Discrete Cosine Transformation (DCT) to generate the frequency representation, then, integrate the features from both the spatial domain and frequency domain for classification.
arXiv Detail & Related papers (2021-05-11T21:44:31Z) - Spatial-spectral Hyperspectral Image Classification via Multiple Random
Anchor Graphs Ensemble Learning [88.60285937702304]
This paper proposes a novel spatial-spectral HSI classification method via multiple random anchor graphs ensemble learning (RAGE)
Firstly, the local binary pattern is adopted to extract the more descriptive features on each selected band, which preserves local structures and subtle changes of a region.
Secondly, the adaptive neighbors assignment is introduced in the construction of anchor graph, to reduce the computational complexity.
arXiv Detail & Related papers (2021-03-25T09:31:41Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning [83.48587570246231]
Visual Similarity plays an important role in many computer vision applications.
Deep metric learning (DML) is a powerful framework for learning such similarities.
We propose and study multiple complementary learning tasks, targeting conceptually different data relationships.
We learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance.
arXiv Detail & Related papers (2020-04-28T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.