Fine-Grained Visual Classification via Simultaneously Learning of
Multi-regional Multi-grained Features
- URL: http://arxiv.org/abs/2102.00367v1
- Date: Sun, 31 Jan 2021 03:46:10 GMT
- Title: Fine-Grained Visual Classification via Simultaneously Learning of
Multi-regional Multi-grained Features
- Authors: Dongliang Chang, Yixiao Zheng, Zhanyu Ma, Ruoyi Du, Kongming Liang
- Abstract summary: Fine-grained visual classification is a challenging task that recognizes the sub-classes belonging to the same meta-class.
In this paper, we argue that mining multi-regional multi-grained features is precisely the key to this task.
Experimental results over four widely used fine-grained image classification datasets demonstrate the effectiveness of the proposed method.
- Score: 15.71408474557042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained visual classification is a challenging task that recognizes the
sub-classes belonging to the same meta-class. Large inter-class similarity and
intra-class variance is the main challenge of this task. Most exiting methods
try to solve this problem by designing complex model structures to explore more
minute and discriminative regions. In this paper, we argue that mining
multi-regional multi-grained features is precisely the key to this task.
Specifically, we introduce a new loss function, termed top-down spatial
attention loss (TDSA-Loss), which contains a multi-stage channel constrained
module and a top-down spatial attention module. The multi-stage channel
constrained module aims to make the feature channels in different stages
category-aligned. Meanwhile, the top-down spatial attention module uses the
attention map generated by high-level aligned feature channels to make
middle-level aligned feature channels to focus on particular regions. Finally,
we can obtain multiple discriminative regions on high-level feature channels
and obtain multiple more minute regions within these discriminative regions on
middle-level feature channels. In summary, we obtain multi-regional
multi-grained features. Experimental results over four widely used fine-grained
image classification datasets demonstrate the effectiveness of the proposed
method. Ablative studies further show the superiority of two modules in the
proposed method. Codes are available at:
https://github.com/dongliangchang/Top-Down-Spatial-Attention-Loss.
Related papers
- Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification [5.4352987210173955]
Task Discrepancy Maximization (TDM) is a task-oriented channel attention method tailored for fine-grained few-shot classification.
SAM highlights channels encoding class-wise discriminative features, while QAM assigns higher weights to object-relevant channels of the query.
Based on these submodules, TDM produces task-adaptive features by focusing on channels encoding class-discriminative details and possessed by the query.
arXiv Detail & Related papers (2023-07-28T08:40:23Z) - Multi-spectral Class Center Network for Face Manipulation Detection and Localization [52.569170436393165]
We propose a novel Multi-Spectral Class Center Network (MSCCNet) for face manipulation detection and localization.
Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations.
Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images.
arXiv Detail & Related papers (2023-05-18T08:09:20Z) - Multi-Scale Feature Fusion: Learning Better Semantic Segmentation for
Road Pothole Detection [9.356003255288417]
This paper presents a novel pothole detection approach based on single-modal semantic segmentation.
It first extracts visual features from input images using a convolutional neural network.
A channel attention module then reweighs the channel features to enhance the consistency of different feature maps.
arXiv Detail & Related papers (2021-12-24T15:07:47Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Channel-wise Knowledge Distillation for Dense Prediction [73.99057249472735]
We propose to align features channel-wise between the student and teacher networks.
We consistently achieve superior performance on three benchmarks with various network structures.
arXiv Detail & Related papers (2020-11-26T12:00:38Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z) - Concentrated Multi-Grained Multi-Attention Network for Video Based
Person Re-Identification [5.761429719197307]
Occlusion is still a severe problem in the video-based Re-IDentification (Re-ID) task.
We propose a Concentrated Multi-grained Multi-Attention Network (CMMANet)
arXiv Detail & Related papers (2020-09-28T02:18:06Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Multi-Task Learning via Co-Attentive Sharing for Pedestrian Attribute
Recognition [8.883961218702824]
Co-Attentive Sharing (CAS) module extracts discriminative channels and spatial regions for more effective feature sharing in multi-task learning.
Our module outperforms the conventional sharing units and achieves superior results compared to the state-of-the-art approaches using many metrics.
arXiv Detail & Related papers (2020-04-07T07:24:22Z) - DFNet: Discriminative feature extraction and integration network for
salient object detection [6.959742268104327]
We focus on two aspects of challenges in saliency detection using Convolutional Neural Networks.
Firstly, since salient objects appear in various sizes, using single-scale convolution would not capture the right size.
Secondly, using multi-level features helps the model use both local and global context.
arXiv Detail & Related papers (2020-04-03T13:56:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.