Improving Fine-Grained Visual Recognition in Low Data Regimes via
Self-Boosting Attention Mechanism
- URL: http://arxiv.org/abs/2208.00617v1
- Date: Mon, 1 Aug 2022 05:36:27 GMT
- Title: Improving Fine-Grained Visual Recognition in Low Data Regimes via
Self-Boosting Attention Mechanism
- Authors: Yangyang Shu, Baosheng Yu, Haiming Xu, Lingqiao Liu
- Abstract summary: Self-boosting attention mechanism (SAM) is a novel method for regularizing the network to focus on the key regions shared across samples and classes.
We develop a variant by using SAM to create multiple attention maps to pool convolutional maps in a style of bilinear pooling.
- Score: 27.628260249895973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenge of fine-grained visual recognition often lies in discovering
the key discriminative regions. While such regions can be automatically
identified from a large-scale labeled dataset, a similar method might become
less effective when only a few annotations are available. In low data regimes,
a network often struggles to choose the correct regions for recognition and
tends to overfit spurious correlated patterns from the training data. To tackle
this issue, this paper proposes the self-boosting attention mechanism, a novel
method for regularizing the network to focus on the key regions shared across
samples and classes. Specifically, the proposed method first generates an
attention map for each training image, highlighting the discriminative part for
identifying the ground-truth object category. Then the generated attention maps
are used as pseudo-annotations. The network is enforced to fit them as an
auxiliary task. We call this approach the self-boosting attention mechanism
(SAM). We also develop a variant by using SAM to create multiple attention maps
to pool convolutional maps in a style of bilinear pooling, dubbed SAM-Bilinear.
Through extensive experimental studies, we show that both methods can
significantly improve fine-grained visual recognition performance on low data
regimes and can be incorporated into existing network architectures. The source
code is publicly available at: https://github.com/GANPerf/SAM
Related papers
- Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization.
It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Regularizing Neural Network Training via Identity-wise Discriminative
Feature Suppression [20.89979858757123]
When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error.
This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation.
arXiv Detail & Related papers (2022-09-29T05:14:56Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Attentive WaveBlock: Complementarity-enhanced Mutual Networks for
Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB)
AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z) - SpotNet: Self-Attention Multi-Task Network for Object Detection [11.444576186559487]
We produce foreground/background segmentation labels in a semi-supervised way, using background subtraction or optical flow.
We use those segmentation maps inside the network as a self-attention mechanism to weight the feature map used to produce the bounding boxes.
We show that by using this method, we obtain a significant mAP improvement on two traffic surveillance datasets.
arXiv Detail & Related papers (2020-02-13T14:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.