Cross-layer Navigation Convolutional Neural Network for Fine-grained
Visual Classification
- URL: http://arxiv.org/abs/2106.10920v1
- Date: Mon, 21 Jun 2021 08:38:27 GMT
- Title: Cross-layer Navigation Convolutional Neural Network for Fine-grained
Visual Classification
- Authors: Chenyu Guo, Jiyang Xie, Kongming Liang, Xian Sun, Zhanyu Ma
- Abstract summary: Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class.
For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions.
We propose cross-layer navigation convolutional neural network for feature fusion.
- Score: 21.223130735592516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained visual classification (FGVC) aims to classify sub-classes of
objects in the same super-class (e.g., species of birds, models of cars). For
the FGVC tasks, the essential solution is to find discriminative subtle
information of the target from local regions. TraditionalFGVC models preferred
to use the refined features,i.e., high-level semantic information for
recognition and rarely use low-level in-formation. However, it turns out that
low-level information which contains rich detail information also has effect on
improving performance. Therefore, in this paper, we propose cross-layer
navigation convolutional neural network for feature fusion. First, the feature
maps extracted by the backbone network are fed into a convolutional long
short-term memory model sequentially from high-level to low-level to perform
feature aggregation. Then, attention mechanisms are used after feature fusion
to extract spatial and channel information while linking the high-level
semantic information and the low-level texture features, which can better
locate the discriminative regions for the FGVC. In the experiments, three
commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars,
andFGVC-Aircraft datasets, are used for evaluation and we demonstrate the
superiority of the proposed method by comparing it with other referred FGVC
methods to show that this method achieves superior results.
Related papers
- PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - ELFIS: Expert Learning for Fine-grained Image Recognition Using Subsets [6.632855264705276]
We propose ELFIS, an expert learning framework for Fine-Grained Visual Recognition.
A set of neural networks-based experts are trained focusing on the meta-categories and are integrated into a multi-task framework.
Experiments show improvements in the SoTA FGVR benchmarks of up to +1.3% of accuracy using both CNNs and transformer-based networks.
arXiv Detail & Related papers (2023-03-16T12:45:19Z) - R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction [21.11038841356125]
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences.
We present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target.
arXiv Detail & Related papers (2022-04-21T13:35:38Z) - Feature Fusion Vision Transformer for Fine-Grained Visual Categorization [22.91753200323264]
We propose a novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)
We aggregate the important tokens from each transformer layer to compensate the local, low-level and middle-level information.
We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectively and efficiently towards selecting discriminative tokens.
arXiv Detail & Related papers (2021-07-06T01:48:43Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Unsupervised Feedforward Feature (UFF) Learning for Point Cloud
Classification and Segmentation [57.62713515497585]
Unsupervised feedforward feature learning is proposed for joint classification and segmentation of 3D point clouds.
The UFF method exploits statistical correlations of points in a point cloud set to learn shape and point features in a one-pass feedforward manner.
It learns global shape features through the encoder and local point features through the encoder-decoder architecture.
arXiv Detail & Related papers (2020-09-02T18:25:25Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z) - Weakly Supervised Attention Pyramid Convolutional Neural Network for
Fine-Grained Visual Classification [71.96618723152487]
We introduce Attention Pyramid Convolutional Neural Network (AP-CNN)
AP-CNN learns both high-level semantic and low-level detailed feature representation.
It can be trained end-to-end, without the need of additional bounding box/part annotations.
arXiv Detail & Related papers (2020-02-09T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.