Learning Attentive Pairwise Interaction for Fine-Grained Classification
- URL: http://arxiv.org/abs/2002.10191v1
- Date: Mon, 24 Feb 2020 12:17:56 GMT
- Title: Learning Attentive Pairwise Interaction for Fine-Grained Classification
- Authors: Peiqin Zhuang, Yali Wang, Yu Qiao
- Abstract summary: We propose a simple but effective Attentive Pairwise Interaction Network (API-Net) for fine-grained classification.
API-Net first learns a mutual feature vector to capture semantic differences in the input pair.
It then compares this mutual vector with individual vectors to generate gates for each input image.
We conduct extensive experiments on five popular benchmarks in fine-grained classification.
- Score: 53.66543841939087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained classification is a challenging problem, due to subtle
differences among highly-confused categories. Most approaches address this
difficulty by learning discriminative representation of individual input image.
On the other hand, humans can effectively identify contrastive clues by
comparing image pairs. Inspired by this fact, this paper proposes a simple but
effective Attentive Pairwise Interaction Network (API-Net), which can
progressively recognize a pair of fine-grained images by interaction.
Specifically, API-Net first learns a mutual feature vector to capture semantic
differences in the input pair. It then compares this mutual vector with
individual vectors to generate gates for each input image. These distinct gate
vectors inherit mutual context on semantic differences, which allow API-Net to
attentively capture contrastive clues by pairwise interaction between two
images. Additionally, we train API-Net in an end-to-end manner with a score
ranking regularization, which can further generalize API-Net by taking feature
priorities into account. We conduct extensive experiments on five popular
benchmarks in fine-grained classification. API-Net outperforms the recent SOTA
methods, i.e., CUB-200-2011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%),
Stanford Dogs (90.3%), and NABirds (88.1%).
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - CorrMatch: Label Propagation via Correlation Matching for
Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch.
We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information.
We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more.
Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z) - Pairwise Comparison Network for Remote Sensing Scene Classification [0.0]
This paper proposes a pairwise comparison network, which contains two main steps: pairwise selection and pairwise representation.
The proposed network first selects similar image pairs, and then represents the image pairs with pairwise representations.
The self-representation is introduced to highlight the informative parts of each image itself, while the mutual-representation is proposed to capture the subtle differences between image pairs.
arXiv Detail & Related papers (2022-05-17T07:31:36Z) - iCAR: Bridging Image Classification and Image-text Alignment for Visual
Recognition [33.2800417526215]
Image classification, which classifies images by pre-defined categories, has been the dominant approach to visual representation learning over the last decade.
Visual learning through image-text alignment, however, has emerged to show promising performance, especially for zero-shot recognition.
We propose a deep fusion method with three adaptations that effectively bridge two learning tasks.
arXiv Detail & Related papers (2022-04-22T15:27:21Z) - Clicking Matters:Towards Interactive Human Parsing [60.35351491254932]
This work is the first attempt to tackle the human parsing task under the interactive setting.
Our IHP solution achieves 85% mIoU on the benchmark LIP, 80% mIoU on PASCAL-Person-Part and CIHP, 75% mIoU on Helen with only 1.95, 3.02, 2.84 and 1.09 clicks per class respectively.
arXiv Detail & Related papers (2021-11-11T11:47:53Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.