Progressive Co-Attention Network for Fine-grained Visual Classification
- URL: http://arxiv.org/abs/2101.08527v1
- Date: Thu, 21 Jan 2021 10:19:02 GMT
- Title: Progressive Co-Attention Network for Fine-grained Visual Classification
- Authors: Tian Zhang, Dongliang Chang, Zhanyu Ma and Jun Guo
- Abstract summary: Fine-grained visual classification aims to recognize images belonging to multiple sub-categories within a same category.
Most existing methods only take individual image as input.
We propose an effective method called progressive co-attention network (PCA-Net) to tackle this problem.
- Score: 20.838908090777885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-grained visual classification aims to recognize images belonging to
multiple sub-categories within a same category. It is a challenging task due to
the inherently subtle variations among highly-confused categories. Most
existing methods only take individual image as input, which may limit the
ability of models to recognize contrastive clues from different images. In this
paper, we propose an effective method called progressive co-attention network
(PCA-Net) to tackle this problem. Specifically, we calculate the channel-wise
similarity by interacting the feature channels within same-category images to
capture the common discriminative features. Considering that complementary
imformation is also crucial for recognition, we erase the prominent areas
enhanced by the channel interaction to force the network to focus on other
discriminative regions. The proposed model can be trained in an end-to-end
manner, and only requires image-level label supervision. It has achieved
competitive results on three fine-grained visual classification benchmark
datasets: CUB-200-2011, Stanford Cars, and FGVC Aircraft.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition
using Unit-Class Loss and Cross-Modality Discriminator [0.43748379918040853]
We propose an end-to-end framework for cross-modal face recognition.
A novel Unit-Class Loss is proposed for preserving identity information while discarding modality information.
The proposed network can be used to extract modality-independent vector representations or a matching-pair classification for test images.
arXiv Detail & Related papers (2021-11-29T06:14:00Z) - A Compositional Feature Embedding and Similarity Metric for
Ultra-Fine-Grained Visual Categorization [16.843126268445726]
Fine-grained visual categorization (FGVC) aims at classifying objects with small inter-class variances.
This paper proposes a novel compositional feature embedding and similarity metric ( CECS) for ultra-fine-grained visual categorization.
Experimental results on two ultra-FGVC datasets and one FGVC dataset with recent benchmark methods consistently demonstrate that the proposed CECS method achieves the state-the-art performance.
arXiv Detail & Related papers (2021-09-25T15:05:25Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Learning Discriminative Representations for Multi-Label Image
Recognition [13.13795708478267]
We propose a unified deep network to learn discriminative features for the multi-label task.
By regularizing the whole network with the proposed loss, the performance of applying the wellknown ResNet-101 is improved significantly.
arXiv Detail & Related papers (2021-07-23T12:10:46Z) - Interpretable Attention Guided Network for Fine-grained Visual
Classification [36.657203916383594]
Fine-grained visual classification (FGVC) is challenging but more critical than traditional classification tasks.
We propose an Interpretable Attention Guided Network (IAGN) for fine-grained visual classification.
arXiv Detail & Related papers (2021-03-08T12:27:51Z) - Few-shot Image Classification with Multi-Facet Prototypes [48.583388368897126]
We organize visual features into facets, which intuitively group features of the same kind.
It is possible to predict facet importance from a pre-trained embedding of the category names.
In particular, we propose an adaptive similarity measure, relying on predicted facet importance weights for a given set of categories.
arXiv Detail & Related papers (2021-02-01T12:43:03Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Channel Interaction Networks for Fine-Grained Image Categorization [61.095320862647476]
Fine-grained image categorization is challenging due to the subtle inter-class differences.
We propose a channel interaction network (CIN), which models the channel-wise interplay both within an image and across images.
Our model can be trained efficiently in an end-to-end fashion without the need of multi-stage training and testing.
arXiv Detail & Related papers (2020-03-11T11:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.