Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot
Recognition
- URL: http://arxiv.org/abs/2009.04724v3
- Date: Wed, 3 Feb 2021 07:26:49 GMT
- Title: Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot
Recognition
- Authors: Siteng Huang, Min Zhang, Yachen Kang, Donglin Wang
- Abstract summary: We devise an attributes-guided attention module (AGAM) to utilize human-annotated attributes and learn more discriminative features.
Our proposed module can significantly improve simple metric-based approaches to achieve state-of-the-art performance.
- Score: 27.0842107128122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The purpose of few-shot recognition is to recognize novel categories with a
limited number of labeled examples in each class. To encourage learning from a
supplementary view, recent approaches have introduced auxiliary semantic
modalities into effective metric-learning frameworks that aim to learn a
feature similarity between training samples (support set) and test samples
(query set). However, these approaches only augment the representations of
samples with available semantics while ignoring the query set, which loses the
potential for the improvement and may lead to a shift between the modalities
combination and the pure-visual representation. In this paper, we devise an
attributes-guided attention module (AGAM) to utilize human-annotated attributes
and learn more discriminative features. This plug-and-play module enables
visual contents and corresponding attributes to collectively focus on important
channels and regions for the support set. And the feature selection is also
achieved for query set with only visual information while the attributes are
not available. Therefore, representations from both sets are improved in a
fine-grained manner. Moreover, an attention alignment mechanism is proposed to
distill knowledge from the guidance of attributes to the pure-visual branch for
samples without attributes. Extensive experiments and analysis show that our
proposed module can significantly improve simple metric-based approaches to
achieve state-of-the-art performance on different datasets and settings.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - OvarNet: Towards Open-vocabulary Object Attribute Recognition [42.90477523238336]
We propose a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr.
The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes.
We show that recognition of semantic category and attributes is complementary for visual scene understanding.
arXiv Detail & Related papers (2023-01-23T15:59:29Z) - Spatial Cross-Attention Improves Self-Supervised Visual Representation
Learning [5.085461418671174]
We introduce an add-on module to facilitate the injection of the knowledge accounting for spatial cross correlations among the samples.
This in turn results in distilling intra-class information including feature level locations and cross similarities between same-class instances.
arXiv Detail & Related papers (2022-06-07T21:14:52Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - Shaping Visual Representations with Attributes for Few-Shot Learning [5.861206243996454]
Few-shot recognition aims to recognize novel categories under low-data regimes.
Recent metric-learning based few-shot learning methods have achieved promising performances.
We propose attribute-shaped learning (ASL), which can normalize visual representations to predict attributes for query images.
arXiv Detail & Related papers (2021-12-13T03:16:19Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.