Transformer with Peak Suppression and Knowledge Guidance for
Fine-grained Image Recognition
- URL: http://arxiv.org/abs/2107.06538v1
- Date: Wed, 14 Jul 2021 08:07:58 GMT
- Title: Transformer with Peak Suppression and Knowledge Guidance for
Fine-grained Image Recognition
- Authors: Xinda Liu, Lili Wang, Xiaoguang Han
- Abstract summary: We propose a transformer architecture with the peak suppression module and knowledge guidance module.
The peak suppression module penalizes the attention to the most discriminative parts in the feature learning process.
The knowledge guidance module compares the image-based representation generated from the peak suppression module with the learnable knowledge embedding set to obtain the knowledge response coefficients.
- Score: 24.02553270481428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained image recognition is challenging because discriminative clues
are usually fragmented, whether from a single image or multiple images. Despite
their significant improvements, most existing methods still focus on the most
discriminative parts from a single image, ignoring informative details in other
regions and lacking consideration of clues from other associated images. In
this paper, we analyze the difficulties of fine-grained image recognition from
a new perspective and propose a transformer architecture with the peak
suppression module and knowledge guidance module, which respects the
diversification of discriminative features in a single image and the
aggregation of discriminative clues among multiple images. Specifically, the
peak suppression module first utilizes a linear projection to convert the input
image into sequential tokens. It then blocks the token based on the attention
response generated by the transformer encoder. This module penalizes the
attention to the most discriminative parts in the feature learning process,
therefore, enhancing the information exploitation of the neglected regions. The
knowledge guidance module compares the image-based representation generated
from the peak suppression module with the learnable knowledge embedding set to
obtain the knowledge response coefficients. Afterwards, it formalizes the
knowledge learning as a classification problem using response coefficients as
the classification scores. Knowledge embeddings and image-based representations
are updated during training so that the knowledge embedding includes
discriminative clues for different images. Finally, we incorporate the acquired
knowledge embeddings into the image-based representations as comprehensive
representations, leading to significantly higher performance. Extensive
evaluations on the six popular datasets demonstrate the advantage of the
proposed method.
Related papers
- Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning [18.534970504136254]
We propose a novel deep metric learning based method to fuse hierarchical prior knowledge about image classes.
Existing deep metric learning incorporated image classification mainly exploits qualitative relativity between image classes.
A new triplet loss function term that exploits quantitative relativity and aligns distances in model latent space with those in knowledge space is also proposed and incorporated in the proposed dual-modality fusion method.
arXiv Detail & Related papers (2024-07-30T07:24:33Z) - CoReFace: Sample-Guided Contrastive Regularization for Deep Face
Recognition [3.1677775852317085]
We propose Contrastive Regularization for Face recognition (CoReFace) to apply image-level regularization in feature representation learning.
Specifically, we employ sample-guided contrastive learning to regularize the training with the image-image relationship directly.
To integrate contrastive learning into face recognition, we augment embeddings instead of images to avoid the image quality degradation.
arXiv Detail & Related papers (2023-04-23T14:33:24Z) - SATS: Self-Attention Transfer for Continual Semantic Segmentation [50.51525791240729]
continual semantic segmentation suffers from the same catastrophic forgetting issue as in continual classification learning.
This study proposes to transfer a new type of information relevant to knowledge, i.e. the relationships between elements within each image.
The relationship information can be effectively obtained from the self-attention maps in a Transformer-style segmentation model.
arXiv Detail & Related papers (2022-03-15T06:09:28Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - Self-supervised Product Quantization for Deep Unsupervised Image
Retrieval [21.99902461562925]
Supervised deep learning-based hash and vector quantization are enabling fast and large-scale image retrieval systems.
We propose the first deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner.
Our method analyzes the image contents to extract descriptive features, allowing us to understand image representations for accurate retrieval.
arXiv Detail & Related papers (2021-09-06T05:02:34Z) - Learning Discriminative Representations for Multi-Label Image
Recognition [13.13795708478267]
We propose a unified deep network to learn discriminative features for the multi-label task.
By regularizing the whole network with the proposed loss, the performance of applying the wellknown ResNet-101 is improved significantly.
arXiv Detail & Related papers (2021-07-23T12:10:46Z) - Few-Shot Learning with Part Discovery and Augmentation from Unlabeled
Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes.
Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations.
Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z) - Attention Model Enhanced Network for Classification of Breast Cancer
Image [54.83246945407568]
AMEN is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular.
To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch.
Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
arXiv Detail & Related papers (2020-10-07T08:44:21Z) - Saliency-driven Class Impressions for Feature Visualization of Deep
Neural Networks [55.11806035788036]
It is advantageous to visualize the features considered to be essential for classification.
Existing visualization methods develop high confidence images consisting of both background and foreground features.
In this work, we propose a saliency-driven approach to visualize discriminative features that are considered most important for a given task.
arXiv Detail & Related papers (2020-07-31T06:11:06Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.