GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot
Learning
- URL: http://arxiv.org/abs/2309.00923v2
- Date: Thu, 14 Sep 2023 14:05:02 GMT
- Title: GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot
Learning
- Authors: Ziming Liu, Jingcai Guo, Xiaocheng Lu, Song Guo, Peiran Dong, Jiewei
Zhang
- Abstract summary: This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
We propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection.
Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
- Score: 24.075034737719776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates a challenging problem of zero-shot learning in the
multi-label scenario (MLZSL), wherein, the model is trained to recognize
multiple unseen classes within a sample (e.g., an image) based on seen classes
and auxiliary knowledge, e.g., semantic information. Existing methods usually
resort to analyzing the relationship of various seen classes residing in a
sample from the dimension of spatial or semantic characteristics, and transfer
the learned model to unseen ones. But they ignore the effective integration of
local and global features. That is, in the process of inferring unseen classes,
global features represent the principal direction of the image in the feature
space, while local features should maintain uniqueness within a certain range.
This integrated neglect will make the model lose its grasp of the main
components of the image. Relying only on the local existence of seen classes
during the inference stage introduces unavoidable bias. In this paper, we
propose a novel and effective group bi-enhancement framework for MLZSL, dubbed
GBE-MLZSL, to fully make use of such properties and enable a more accurate and
robust visual-semantic projection. Specifically, we split the feature maps into
several feature groups, of which each feature group can be trained
independently with the Local Information Distinguishing Module (LID) to ensure
uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to
preserve the principal direction. Besides, a static graph structure is designed
to construct the correlation of local features. Experiments on large-scale
MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the
proposed GBE-MLZSL outperforms other state-of-the-art methods with large
margins.
Related papers
- Epsilon: Exploring Comprehensive Visual-Semantic Projection for Multi-Label Zero-Shot Learning [23.96220607033524]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL)
It is trained to recognize multiple unseen classes within a sample based on seen classes and auxiliary knowledge.
We propose a novel and comprehensive visual-semantic framework for MLZSL, dubbed Epsilon, to fully make use of such properties.
arXiv Detail & Related papers (2024-08-22T09:45:24Z) - `Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning [47.1040786932317]
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
We take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features.
A learned function maps the region features to region-specific attribute attention used to construct class part prototypes.
arXiv Detail & Related papers (2024-04-12T18:37:00Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - FREE: Feature Refinement for Generalized Zero-Shot Learning [86.41074134041394]
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias.
Most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks.
We propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE) to tackle the above problem.
arXiv Detail & Related papers (2021-07-29T08:11:01Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Attribute-Modulated Generative Meta Learning for Zero-Shot
Classification [52.64680991682722]
We present the Attribute-Modulated generAtive meta-model for Zero-shot learning (AMAZ)
Our model consists of an attribute-aware modulation network and an attribute-augmented generative network.
Our empirical evaluations show that AMAZ improves state-of-the-art methods by 3.8% and 5.1% in ZSL and generalized ZSL settings, respectively.
arXiv Detail & Related papers (2021-04-22T04:16:43Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.