TOAN: Target-Oriented Alignment Network for Fine-Grained Image
Categorization with Few Labeled Samples
- URL: http://arxiv.org/abs/2005.13820v2
- Date: Wed, 10 Mar 2021 05:40:46 GMT
- Title: TOAN: Target-Oriented Alignment Network for Fine-Grained Image
Categorization with Few Labeled Samples
- Authors: Huaxi Huang, Junjie Zhang, Jian Zhang, Qiang Wu, Chang Xu
- Abstract summary: We propose a Target-Oriented Alignment Network (TOAN) to investigate the fine-grained relation between the target query image and support classes.
The feature of each support image is transformed to match the query ones in the embedding feature space, which reduces the disparity explicitly within each category.
- Score: 25.68199820110267
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The challenges of high intra-class variance yet low inter-class fluctuations
in fine-grained visual categorization are more severe with few labeled samples,
\textit{i.e.,} Fine-Grained categorization problems under the Few-Shot setting
(FGFS). High-order features are usually developed to uncover subtle differences
between sub-categories in FGFS, but they are less effective in handling the
high intra-class variance. In this paper, we propose a Target-Oriented
Alignment Network (TOAN) to investigate the fine-grained relation between the
target query image and support classes. The feature of each support image is
transformed to match the query ones in the embedding feature space, which
reduces the disparity explicitly within each category. Moreover, different from
existing FGFS approaches devise the high-order features over the global image
with less explicit consideration of discriminative parts, we generate
discriminative fine-grained features by integrating compositional concept
representations to global second-order pooling. Extensive experiments are
conducted on four fine-grained benchmarks to demonstrate the effectiveness of
TOAN compared with the state-of-the-art models.
Related papers
- EIANet: A Novel Domain Adaptation Approach to Maximize Class Distinction with Neural Collapse Principles [15.19374752514876]
Source-free domain adaptation (SFDA) aims to transfer knowledge from a labelled source domain to an unlabelled target domain.
A major challenge in SFDA is deriving accurate categorical information for the target domain.
We introduce a novel ETF-Informed Attention Network (EIANet) to separate class prototypes.
arXiv Detail & Related papers (2024-07-23T05:31:05Z) - Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - R2-Trans:Fine-Grained Visual Categorization with Redundancy Reduction [21.11038841356125]
Fine-grained visual categorization (FGVC) aims to discriminate similar subcategories, whose main challenge is the large intraclass diversities and subtle inter-class differences.
We present a novel approach for FGVC, which can simultaneously make use of partial yet sufficient discriminative information in environmental cues and also compress the redundant information in class-token with respect to the target.
arXiv Detail & Related papers (2022-04-21T13:35:38Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - A Compositional Feature Embedding and Similarity Metric for
Ultra-Fine-Grained Visual Categorization [16.843126268445726]
Fine-grained visual categorization (FGVC) aims at classifying objects with small inter-class variances.
This paper proposes a novel compositional feature embedding and similarity metric ( CECS) for ultra-fine-grained visual categorization.
Experimental results on two ultra-FGVC datasets and one FGVC dataset with recent benchmark methods consistently demonstrate that the proposed CECS method achieves the state-the-art performance.
arXiv Detail & Related papers (2021-09-25T15:05:25Z) - Channel DropBlock: An Improved Regularization Method for Fine-Grained
Visual Classification [58.07257910065007]
Existing approaches mainly tackle this problem by introducing attention mechanisms to locate the discriminative parts or feature encoding approaches to extract the highly parameterized features in a weakly-supervised fashion.
In this work, we propose a lightweight yet effective regularization method named Channel DropBlock (CDB) in combination with two alternative correlation metrics, to address this problem.
arXiv Detail & Related papers (2021-06-07T09:03:02Z) - Label Geometry Aware Discriminator for Conditional Generative Networks [40.89719383597279]
Conditional Generative Adversarial Networks (GANs) can generate highly photo realistic images with desired target classes.
These synthetic images have not always been helpful to improve downstream supervised tasks such as image classification.
arXiv Detail & Related papers (2021-05-12T08:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.