Focus Longer to See Better:Recursively Refined Attention for
Fine-Grained Image Classification
- URL: http://arxiv.org/abs/2005.10979v1
- Date: Fri, 22 May 2020 03:14:18 GMT
- Title: Focus Longer to See Better:Recursively Refined Attention for
Fine-Grained Image Classification
- Authors: Prateek Shroff, Tianlong Chen, Yunchao Wei, Zhangyang Wang
- Abstract summary: Deep Neural Network has shown great strides in the coarse-grained image classification task.
In this paper, we try to focus on these marginal differences to extract more representative features.
Our network repetitively focuses on parts of images to spot small discriminative parts among the classes.
- Score: 148.4492675737644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Network has shown great strides in the coarse-grained image
classification task. It was in part due to its strong ability to extract
discriminative feature representations from the images. However, the marginal
visual difference between different classes in fine-grained images makes this
very task harder. In this paper, we tried to focus on these marginal
differences to extract more representative features. Similar to human vision,
our network repetitively focuses on parts of images to spot small
discriminative parts among the classes. Moreover, we show through
interpretability techniques how our network focus changes from coarse to fine
details. Through our experiments, we also show that a simple attention model
can aggregate (weighted) these finer details to focus on the most dominant
discriminative part of the image. Our network uses only image-level labels and
does not need bounding box/part annotation information. Further, the simplicity
of our network makes it an easy plug-n-play module. Apart from providing
interpretability, our network boosts the performance (up to 2%) when compared
to its baseline counterparts. Our codebase is available at
https://github.com/TAMU-VITA/Focus-Longer-to-See-Better
Related papers
- Unlocking Feature Visualization for Deeper Networks with MAgnitude
Constrained Optimization [17.93878159391899]
We describe MACO, a simple approach to generate interpretable images.
Our approach yields significantly better results (both qualitatively and quantitatively) and unlocks efficient and interpretable feature visualizations for large state-of-the-art neural networks.
We validate our method on a novel benchmark for comparing feature visualization methods, and release its visualizations for all classes of the ImageNet dataset.
arXiv Detail & Related papers (2023-06-11T23:33:59Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Learning to ignore: rethinking attention in CNNs [87.01305532842878]
We propose to reformulate the attention mechanism in CNNs to learn to ignore instead of learning to attend.
Specifically, we propose to explicitly learn irrelevant information in the scene and suppress it in the produced representation.
arXiv Detail & Related papers (2021-11-10T13:47:37Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Learning Discriminative Representations for Multi-Label Image
Recognition [13.13795708478267]
We propose a unified deep network to learn discriminative features for the multi-label task.
By regularizing the whole network with the proposed loss, the performance of applying the wellknown ResNet-101 is improved significantly.
arXiv Detail & Related papers (2021-07-23T12:10:46Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Shallow Feature Based Dense Attention Network for Crowd Counting [103.67446852449551]
We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images.
Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
arXiv Detail & Related papers (2020-06-17T13:34:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.