Fine-Grained Visual Classification with Efficient End-to-end
Localization
- URL: http://arxiv.org/abs/2005.05123v1
- Date: Mon, 11 May 2020 14:07:06 GMT
- Title: Fine-Grained Visual Classification with Efficient End-to-end
Localization
- Authors: Harald Hanselmann and Hermann Ney
- Abstract summary: We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
- Score: 49.9887676289364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The term fine-grained visual classification (FGVC) refers to classification
tasks where the classes are very similar and the classification model needs to
be able to find subtle differences to make the correct prediction.
State-of-the-art approaches often include a localization step designed to help
a classification network by localizing the relevant parts of the input images.
However, this usually requires multiple iterations or passes through a full
classification network or complex training schedules. In this work we present
an efficient localization module that can be fused with a classification
network in an end-to-end setup. On the one hand the module is trained by the
gradient flowing back from the classification network. On the other hand, two
self-supervised loss functions are introduced to increase the localization
accuracy. We evaluate the new model on the three benchmark datasets
CUB200-2011, Stanford Cars and FGVC-Aircraft and are able to achieve
competitive recognition performance.
Related papers
- Fine-Grained Visual Classification using Self Assessment Classifier [12.596520707449027]
Extracting discriminative features plays a crucial role in the fine-grained visual classification task.
In this paper, we introduce a Self Assessment, which simultaneously leverages the representation of the image and top-k prediction classes.
We show that our method achieves new state-of-the-art results on CUB200-2011, Stanford Dog, and FGVC Aircraft datasets.
arXiv Detail & Related papers (2022-05-21T07:41:27Z) - A Novel Plug-in Module for Fine-Grained Visual Classification [0.19336815376402716]
We propose a novel plug-in module that can be integrated to many common backbones to provide strongly discriminative regions.
Experimental results show that the proposed plugin module outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-08T12:35:58Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Re-rank Coarse Classification with Local Region Enhanced Features for
Fine-Grained Image Recognition [22.83821575990778]
We re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy.
To learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure.
Our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft.
arXiv Detail & Related papers (2021-02-19T11:30:25Z) - Equivalent Classification Mapping for Weakly Supervised Temporal Action
Localization [92.58946210982411]
Weakly supervised temporal action localization is a newly emerging yet widely studied topic in recent years.
The pre-classification pipeline first performs classification on each video snippet and then aggregate the snippet-level classification scores to obtain the video-level classification score.
The post-classification pipeline aggregates the snippet-level features first and then predicts the video-level classification score based on the aggregated feature.
arXiv Detail & Related papers (2020-08-18T03:54:56Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z) - Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization.
Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization.
This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z) - Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition
from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions.
We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.