Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image
Classification
- URL: http://arxiv.org/abs/2102.13322v1
- Date: Fri, 26 Feb 2021 06:34:35 GMT
- Title: Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image
Classification
- Authors: Cheng Xie, Ting Zeng, Hongxin Xiang, Keqin Li, Yun Yang, Qing Liu
- Abstract summary: We propose a novel zero-shot learning approach, GAN-CST, based on class knowledge to visual feature learning.
The proposed model delivers superior performance over state-of-the-art approaches.
- Score: 18.299463254965264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New categories can be discovered by transforming semantic features into
synthesized visual features without corresponding training samples in zero-shot
image classification. Although significant progress has been made in generating
high-quality synthesized visual features using generative adversarial networks,
guaranteeing semantic consistency between the semantic features and visual
features remains very challenging. In this paper, we propose a novel zero-shot
learning approach, GAN-CST, based on class knowledge to visual feature learning
to tackle the problem. The approach consists of three parts, class knowledge
overlay, semi-supervised learning and triplet loss. It applies class knowledge
overlay (CKO) to obtain knowledge not only from the corresponding class but
also from other classes that have the knowledge overlay. It ensures that the
knowledge-to-visual learning process has adequate information to generate
synthesized visual features. The approach also applies a semi-supervised
learning process to re-train knowledge-to-visual model. It contributes to
reinforcing synthesized visual features generation as well as new category
prediction. We tabulate results on a number of benchmark datasets demonstrating
that the proposed model delivers superior performance over state-of-the-art
approaches.
Related papers
- Learning Prompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning [56.29097276129473]
We propose a simple yet effective framework, named Learning Prompt with Distribution-based Feature Replay (LP-DiF)
To prevent the learnable prompt from forgetting old knowledge in the new session, we propose a pseudo-feature replay approach.
When progressing to a new session, pseudo-features are sampled from old-class distributions combined with training images of the current session to optimize the prompt.
arXiv Detail & Related papers (2024-01-03T07:59:17Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot
Learning [85.2093650907943]
We propose SEmantic Guided Attention (SEGA) to teach machines to recognize a new category.
SEGA uses semantic knowledge to guide the visual perception in a top-down manner about what visual features should be paid attention to.
We show that our semantic guided attention realizes anticipated function and outperforms state-of-the-art results.
arXiv Detail & Related papers (2021-11-08T08:03:44Z) - Transductive Zero-Shot Learning by Decoupled Feature Generation [30.664199050468472]
We focus on the transductive setting, in which unlabelled visual data from unseen classes is available.
We propose to decouple tasks of generating realistic visual features and translating semantic attributes into visual cues.
We present a detailed ablation study to dissect the effect of our proposed decoupling approach, while demonstrating its superiority over the related state-of-the-art.
arXiv Detail & Related papers (2021-02-05T16:17:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.