Context-aware Attentional Pooling (CAP) for Fine-grained Visual
Classification
- URL: http://arxiv.org/abs/2101.06635v1
- Date: Sun, 17 Jan 2021 10:15:02 GMT
- Title: Context-aware Attentional Pooling (CAP) for Fine-grained Visual
Classification
- Authors: Ardhendu Behera, Zachary Wharton, Pradeep Hewage, Asish Bera
- Abstract summary: Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition.
We propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients.
We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets.
- Score: 2.963101656293054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep convolutional neural networks (CNNs) have shown a strong ability in
mining discriminative object pose and parts information for image recognition.
For fine-grained recognition, context-aware rich feature representation of
object/scene plays a key role since it exhibits a significant variance in the
same subcategory and subtle variance among different subcategories. Finding the
subtle variance that fully characterizes the object/scene is not
straightforward. To address this, we propose a novel context-aware attentional
pooling (CAP) that effectively captures subtle changes via sub-pixel gradients,
and learns to attend informative integral regions and their importance in
discriminating different subcategories without requiring the bounding-box
and/or distinguishable part annotations. We also introduce a novel feature
encoding by considering the intrinsic consistency between the informativeness
of the integral regions and their spatial structures to capture the semantic
correlation among them. Our approach is simple yet extremely effective and can
be easily applied on top of a standard classification backbone network. We
evaluate our approach using six state-of-the-art (SotA) backbone networks and
eight benchmark datasets. Our method significantly outperforms the SotA
approaches on six datasets and is very competitive with the remaining two.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained
Image Categorization [24.286426387100423]
We propose a method that captures subtle changes by aggregating context-aware features from most relevant image-regions.
Our approach is inspired by the recent advancement in self-attention and graph neural networks (GNNs)
It outperforms the state-of-the-art approaches by a significant margin in recognition accuracy.
arXiv Detail & Related papers (2022-09-05T19:43:15Z) - R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction [21.11038841356125]
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences.
We present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target.
arXiv Detail & Related papers (2022-04-21T13:35:38Z) - Regional Semantic Contrast and Aggregation for Weakly Supervised
Semantic Segmentation [25.231470587575238]
We propose regional semantic contrast and aggregation (RCA) for learning semantic segmentation.
RCA is equipped with a regional memory bank to store massive, diverse object patterns appearing in training data.
RCA earns a strong capability of fine-grained semantic understanding, and eventually establishes new state-of-the-art results on two popular benchmarks.
arXiv Detail & Related papers (2022-03-17T23:29:03Z) - Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision [66.56535902642085]
This paper tackles the problem of fine-grained region detection in deformed clothes using only a depth image.
We define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points.
We introduce a U-net based network to segment and label these parts.
We show that training our network solely with synthetic data and the proposed DA yields results competitive with models trained on real data.
arXiv Detail & Related papers (2021-10-06T16:31:20Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Semantic Distribution-aware Contrastive Adaptation for Semantic
Segmentation [50.621269117524925]
Domain adaptive semantic segmentation refers to making predictions on a certain target domain with only annotations of a specific source domain.
We present a semantic distribution-aware contrastive adaptation algorithm that enables pixel-wise representation alignment.
We evaluate SDCA on multiple benchmarks, achieving considerable improvements over existing algorithms.
arXiv Detail & Related papers (2021-05-11T13:21:25Z) - TransFG: A Transformer Architecture for Fine-grained Recognition [27.76159820385425]
Recently, vision transformer (ViT) shows its strong performance in the traditional classification task.
We propose a novel transformer-based framework TransFG where we integrate all raw attention weights of the transformer into an attention map.
A contrastive loss is applied to further enlarge the distance between feature representations of similar sub-classes.
arXiv Detail & Related papers (2021-03-14T17:03:53Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.