Goal-Oriented Gaze Estimation for Zero-Shot Learning
- URL: http://arxiv.org/abs/2103.03433v1
- Date: Fri, 5 Mar 2021 02:14:57 GMT
- Title: Goal-Oriented Gaze Estimation for Zero-Shot Learning
- Authors: Yang Liu, Lei Zhou, Xiao Bai, Yifei Huang, Lin Gu, Jun Zhou, Tatsuya
Harada
- Abstract summary: We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
- Score: 62.52340838817908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot learning (ZSL) aims to recognize novel classes by transferring
semantic knowledge from seen classes to unseen classes. Since semantic
knowledge is built on attributes shared between different classes, which are
highly local, strong prior for localization of object attribute is beneficial
for visual-semantic embedding. Interestingly, when recognizing unseen images,
human would also automatically gaze at regions with certain semantic clue.
Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to
improve the discriminative attribute localization based on the class-level
attributes for ZSL. We aim to predict the actual human gaze location to get the
visual attention regions for recognizing a novel object guided by attribute
description. Specifically, the task-dependent attention is learned with the
goal-oriented GEM, and the global image features are simultaneously optimized
with the regression of local attribute features. Experiments on three ZSL
benchmarks, i.e., CUB, SUN and AWA2, show the superiority or competitiveness of
our proposed method against the state-of-the-art ZSL methods. The ablation
analysis on real gaze data CUB-VWSW also validates the benefits and accuracy of
our gaze estimation module. This work implies the promising benefits of
collecting human gaze dataset and automatic gaze estimation algorithms on
high-level computer vision tasks. The code is available at
https://github.com/osierboy/GEM-ZSL.
Related papers
- Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer.
Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion.
For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z) - `Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning [47.1040786932317]
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
We take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features.
A learned function maps the region features to region-specific attribute attention used to construct class part prototypes.
arXiv Detail & Related papers (2024-04-12T18:37:00Z) - Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene
Classification [26.340737217001497]
Zero-shot learning (ZSL) allows for identifying novel classes that are not seen during training.
Previous ZSL models mainly depend on manually-labeled attributes or word embeddings extracted from language models to transfer knowledge from seen classes to novel classes.
We propose to collect visually detectable attributes automatically. We predict attributes for each class by depicting the semantic-visual similarity between attributes and images.
arXiv Detail & Related papers (2024-02-03T09:18:49Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Attribute Prototype Network for Any-Shot Learning [113.50220968583353]
We argue that an image representation with integrated attribute localization ability would be beneficial for any-shot, i.e. zero-shot and few-shot, image classification tasks.
We propose a novel representation learning framework that jointly learns global and local features using only class-level attributes.
arXiv Detail & Related papers (2022-04-04T02:25:40Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning [11.66422653137002]
We propose an attention-based model in the problem settings of Zero-Shot Learning to learn attributes useful for unseen class recognition.
Our method uses an attention mechanism adapted from Vision Transformer to capture and learn discriminative attributes by splitting images into small patches.
arXiv Detail & Related papers (2021-07-30T19:08:44Z) - Attribute Prototype Network for Zero-Shot Learning [113.50220968583353]
We propose a novel zero-shot representation learning framework that jointly learns discriminative global and local features.
Our model points to the visual evidence of the attributes in an image, confirming the improved attribute localization ability of our image representation.
arXiv Detail & Related papers (2020-08-19T06:46:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.