Semantic Prompt for Few-Shot Image Recognition
- URL: http://arxiv.org/abs/2303.14123v1
- Date: Fri, 24 Mar 2023 16:32:19 GMT
- Title: Semantic Prompt for Few-Shot Image Recognition
- Authors: Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu
Tan
- Abstract summary: We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
- Score: 76.68959583129335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot learning is a challenging problem since only a few examples are
provided to recognize a new class. Several recent studies exploit additional
semantic information, e.g. text embeddings of class names, to address the issue
of rare samples through combining semantic prototypes with visual prototypes.
However, these methods still suffer from the spurious visual features learned
from the rare support samples, resulting in limited benefits. In this paper, we
propose a novel Semantic Prompt (SP) approach for few-shot learning. Instead of
the naive exploitation of semantic information for remedying classifiers, we
explore leveraging semantic information as prompts to tune the visual feature
extraction network adaptively. Specifically, we design two complementary
mechanisms to insert semantic prompts into the feature extractor: one is to
enable the interaction between semantic prompts and patch embeddings along the
spatial dimension via self-attention, another is to supplement visual features
with the transformed semantic prompts along the channel dimension. By combining
these two mechanisms, the feature extractor presents a better ability to attend
to the class-specific features and obtains more generalized image
representations with merely a few support samples. Through extensive
experiments on four datasets, the proposed approach achieves promising results,
improving the 1-shot learning accuracy by 3.67% on average.
Related papers
- Disentangling Dense Embeddings with Sparse Autoencoders [0.0]
Sparse autoencoders (SAEs) have shown promise in extracting interpretable features from complex neural networks.
We present one of the first applications of SAEs to dense text embeddings from large language models.
We show that the resulting sparse representations maintain semantic fidelity while offering interpretability.
arXiv Detail & Related papers (2024-08-01T15:46:22Z) - Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition [49.26065739704278]
We propose a framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition.
An instance-view data hallucination module hallucinates each sample of a novel class to generate new data.
A prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class.
arXiv Detail & Related papers (2024-01-13T12:32:29Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Multi-Modal Few-Shot Object Detection with Meta-Learning-Based
Cross-Modal Prompting [77.69172089359606]
We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection.
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning.
We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.
arXiv Detail & Related papers (2022-04-16T16:45:06Z) - Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot
Recognition [27.0842107128122]
We devise an attributes-guided attention module (AGAM) to utilize human-annotated attributes and learn more discriminative features.
Our proposed module can significantly improve simple metric-based approaches to achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-09-10T08:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.