Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning
- URL: http://arxiv.org/abs/2108.00045v1
- Date: Fri, 30 Jul 2021 19:08:44 GMT
- Title: Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning
- Authors: Faisal Alamri and Anjan Dutta
- Abstract summary: We propose an attention-based model in the problem settings of Zero-Shot Learning to learn attributes useful for unseen class recognition.
Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches.
- Score: 11.66422653137002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-Shot Learning (ZSL) aims to recognise unseen object classes, which are
not observed during the training phase. The existing body of works on ZSL
mostly relies on pretrained visual features and lacks the explicit attribute
localisation mechanism on images. In this work, we propose an attention-based
model in the problem settings of ZSL to learn attributes useful for unseen
class recognition. Our method uses an attention mechanism adapted from Vision
Transformer to capture and learn discriminative attributes by splitting images
into small patches. We conduct experiments on three popular ZSL benchmarks
(i.e., AWA2, CUB and SUN) and set new state-of-the-art harmonic mean results
{on all the three datasets}, which illustrate the effectiveness of our proposed
method.
Related papers
- Zero-Shot Learning by Harnessing Adversarial Samples [52.09717785644816]
We propose a novel Zero-Shot Learning (ZSL) approach by Harnessing Adversarial Samples (HAS)
HAS advances ZSL through adversarial training which takes into account three crucial aspects.
We demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios.
arXiv Detail & Related papers (2023-08-01T06:19:13Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Implicit and Explicit Attention for Zero-Shot Learning [11.66422653137002]
We propose implicit and explicit attention mechanisms to address the bias problem in Zero-Shot Learning (ZSL) models.
We conduct comprehensive experiments on three popular benchmarks: AWA2, CUB and SUN.
arXiv Detail & Related papers (2021-10-02T18:06:21Z) - Attribute-Modulated Generative Meta Learning for Zero-Shot
Classification [52.64680991682722]
We present the Attribute-Modulated generAtive meta-model for Zero-shot learning (AMAZ)
Our model consists of an attribute-aware modulation network and an attribute-augmented generative network.
Our empirical evaluations show that AMAZ improves state-of-the-art methods by 3.8% and 5.1% in ZSL and generalized ZSL settings, respectively.
arXiv Detail & Related papers (2021-04-22T04:16:43Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - Adversarial Self-Supervised Learning for Semi-Supervised 3D Action
Recognition [123.62183172631443]
We present Adversarial Self-Supervised Learning (ASSL), a novel framework that tightly couples SSL and the semi-supervised scheme.
Specifically, we design an effective SSL scheme to improve the discrimination capability of learned representations for 3D action recognition.
arXiv Detail & Related papers (2020-07-12T08:01:06Z) - A Biologically Inspired Feature Enhancement Framework for Zero-Shot
Learning [18.348568695197553]
This paper proposes a biologically inspired feature enhancement framework for Zero-Shot Learning (ZSL) algorithms.
Specifically, we design a dual-channel learning framework that uses auxiliary data sets to enhance the feature extractor of the ZSL model.
Our proposed method can effectively improve the ability of the ZSL model and achieve state-of-the-art results on three benchmark ZSL tasks.
arXiv Detail & Related papers (2020-05-13T13:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.