Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification
- URL: http://arxiv.org/abs/2002.03353v1
- Date: Sun, 9 Feb 2020 12:33:23 GMT
- Title: Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification
- Authors: Yifeng Ding, Shaoguo Wen, Jiyang Xie, Dongliang Chang, Zhanyu Ma,
Zhongwei Si, Haibin Ling
- Abstract summary: We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
- Score: 71.96618723152487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifying the sub-categories of an object from the same super-category
(e.g. bird species, car and aircraft models) in fine-grained visual
classification (FGVC) highly relies on discriminative feature representation
and accurate region localization. Existing approaches mainly focus on
distilling information from high-level features. In this paper, however, we
show that by integrating low-level information (e.g. color, edge junctions,
texture patterns), performance can be improved with enhanced feature
representation and accurately located discriminative regions. Our solution,
named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of a) a
pyramidal hierarchy structure with a top-down feature pathway and a bottom-up
attention pathway, and hence learns both high-level semantic and low-level
detailed feature representation, and b) an ROI guided refinement strategy with
ROI guided dropblock and ROI guided zoom-in, which refines features with
discriminative local regions enhanced and background noises eliminated. The
proposed AP-CNN can be trained end-to-end, without the need of additional
bounding box/part annotations. Extensive experiments on three commonly used
FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that
our approach can achieve state-of-the-art performance. Code available at
\url{http://dwz1.cc/ci8so8a}
Related papers
- TOPIQ: A Top-down Approach from Semantics to Distortions for Image
Quality Assessment [53.72721476803585]
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks.
We propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions.
A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features.
arXiv Detail & Related papers (2023-08-06T09:08:37Z) - Cross-layer Attention Network for Fine-grained Visual Categorization [12.249254142531381]
Learning discnative representations for subtle localized details plays a significant role in Fine-grained Visual Categorization (FGVC)
We build a mutual refinement mechanism between the mid-level feature maps and the top-level feature map by our proposed Cross-layer Attention Network (CLAN)
Experimental results show our approach achieves state-of-the-art on three publicly available fine-grained recognition datasets.
arXiv Detail & Related papers (2022-10-17T06:57:51Z) - Local Augmentation for Graph Neural Networks [78.48812244668017]
We introduce the local augmentation, which enhances node features by its local subgraph structures.
Based on the local augmentation, we further design a novel framework: LA-GNN, which can apply to any GNN models in a plug-and-play manner.
arXiv Detail & Related papers (2021-09-08T18:10:08Z) - Feature Fusion Vision Transformer for Fine-Grained Visual Categorization [22.91753200323264]
We propose a novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)
We aggregate the important tokens from each transformer layer to compensate the local, low-level and middle-level information.
We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectively and efficiently towards selecting discriminative tokens.
arXiv Detail & Related papers (2021-07-06T01:48:43Z) - Cross-layer Navigation Convolutional Neural Network for Fine-grained
Visual Classification [21.223130735592516]
Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class.
For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions.
We propose cross-layer navigation convolutional neural network for feature fusion.
arXiv Detail & Related papers (2021-06-21T08:38:27Z) - Re-rank Coarse Classification with Local Region Enhanced Features for
Fine-Grained Image Recognition [22.83821575990778]
We re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy.
To learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure.
Our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft.
arXiv Detail & Related papers (2021-02-19T11:30:25Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Unsupervised Feedforward Feature (UFF) Learning for Point Cloud
Classification and Segmentation [57.62713515497585]
Unsupervised feedforward feature learning is proposed for joint classification and segmentation of 3D point clouds.
The UFF method exploits statistical correlations of points in a point cloud set to learn shape and point features in a one-pass feedforward manner.
It learns global shape features through the encoder and local point features through the encoder-decoder architecture.
arXiv Detail & Related papers (2020-09-02T18:25:25Z) - Hierarchical Bi-Directional Feature Perception Network for Person
Re-Identification [12.259747100939078]
Previous Person Re-Identification (Re-ID) models aim to focus on the most discriminative region of an image.
We propose a novel model named Hierarchical Bi-directional Feature Perception Network (HBFP-Net) to correlate multi-level information and reinforce each other.
Experiments implemented on the mainstream evaluation including Market-1501, CUHK03 and DukeMTMC-ReID datasets show that our method outperforms the recent SOTA Re-ID models.
arXiv Detail & Related papers (2020-08-08T12:33:32Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.