SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot
Learning
- URL: http://arxiv.org/abs/2111.04316v1
- Date: Mon, 8 Nov 2021 08:03:44 GMT
- Title: SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot
Learning
- Authors: Fengyuan Yang, Ruiping Wang, Xilin Chen
- Abstract summary: We propose SEmantic Guided Attention (SEGA) to teach machines to recognize a new category.
SEGA uses semantic knowledge to guide the visual perception in a top-down manner about what visual features should be paid attention to.
We show that our semantic guided attention realizes anticipated function and outperforms state-of-the-art results.
- Score: 85.2093650907943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Teaching machines to recognize a new category based on few training samples
especially only one remains challenging owing to the incomprehensive
understanding of the novel category caused by the lack of data. However, human
can learn new classes quickly even given few samples since human can tell what
discriminative features should be focused on about each category based on both
the visual and semantic prior knowledge. To better utilize those prior
knowledge, we propose the SEmantic Guided Attention (SEGA) mechanism where the
semantic knowledge is used to guide the visual perception in a top-down manner
about what visual features should be paid attention to when distinguishing a
category from the others. As a result, the embedding of the novel class even
with few samples can be more discriminative. Concretely, a feature extractor is
trained to embed few images of each novel class into a visual prototype with
the help of transferring visual prior knowledge from base classes. Then we
learn a network that maps semantic knowledge to category-specific attention
vectors which will be used to perform feature selection to enhance the visual
prototypes. Extensive experiments on miniImageNet, tieredImageNet, CIFAR-FS,
and CUB indicate that our semantic guided attention realizes anticipated
function and outperforms state-of-the-art results.
Related papers
- Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Automatically Discovering Novel Visual Categories with Self-supervised
Prototype Learning [68.63910949916209]
This paper tackles the problem of novel category discovery (NCD), which aims to discriminate unknown categories in large-scale image collections.
We propose a novel adaptive prototype learning method consisting of two main stages: prototypical representation learning and prototypical self-training.
We conduct extensive experiments on four benchmark datasets and demonstrate the effectiveness and robustness of the proposed method with state-of-the-art performance.
arXiv Detail & Related papers (2022-08-01T16:34:33Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image
Classification [18.299463254965264]
We propose a novel zero-shot learning approach, GAN-CST, based on class knowledge to visual feature learning.
The proposed model delivers superior performance over state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-26T06:34:35Z) - Zero-shot Learning with Deep Neural Networks for Object Recognition [8.572654816871873]
Zero-shot learning deals with the ability to recognize objects without any visual training sample.
This chapter presents a review of the approaches based on deep neural networks to tackle the ZSL problem.
arXiv Detail & Related papers (2021-02-05T12:27:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.