Multi-Knowledge Fusion for New Feature Generation in Generalized
Zero-Shot Learning
- URL: http://arxiv.org/abs/2102.11566v1
- Date: Tue, 23 Feb 2021 09:11:05 GMT
- Title: Multi-Knowledge Fusion for New Feature Generation in Generalized
Zero-Shot Learning
- Authors: Hongxin Xiang, Cheng Xie, Ting Zeng, Yun Yang
- Abstract summary: We propose a novel generative ZSL method to learn more generalized features from multi-knowledge with continuously generated new semantics in semantic-to-visual embedding.
We show that our approach can achieve significantly better performance compared to existing state-of-the-art methods on a large number of benchmarks for several ZSL tasks.
- Score: 4.241513887019675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Suffering from the semantic insufficiency and domain-shift problems, most of
existing state-of-the-art methods fail to achieve satisfactory results for
Zero-Shot Learning (ZSL). In order to alleviate these problems, we propose a
novel generative ZSL method to learn more generalized features from
multi-knowledge with continuously generated new semantics in semantic-to-visual
embedding. In our approach, the proposed Multi-Knowledge Fusion Network
(MKFNet) takes different semantic features from multi-knowledge as input, which
enables more relevant semantic features to be trained for semantic-to-visual
embedding, and finally generates more generalized visual features by adaptively
fusing visual features from different knowledge domain. The proposed New
Feature Generator (NFG) with adaptive genetic strategy is used to enrich
semantic information on the one hand, and on the other hand it greatly improves
the intersection of visual feature generated by MKFNet and unseen visual
faetures. Empirically, we show that our approach can achieve significantly
better performance compared to existing state-of-the-art methods on a large
number of benchmarks for several ZSL tasks, including traditional ZSL,
generalized ZSL and zero-shot retrieval.
Related papers
- Improving vision-language alignment with graph spiking hybrid Networks [6.707524980629404]
This paper proposes a comprehensive visual semantic representation module, necessitating the utilization of panoptic segmentation to generate fine-grained semantic features.
We propose a novel Graph Spiking Hybrid Network (GSHN) that integrates the complementary advantages of Spiking Neural Networks (SNNs) and Graph Attention Networks (GATs) to encode visual semantic information.
arXiv Detail & Related papers (2025-01-31T11:55:17Z) - Optimizing Speech Multi-View Feature Fusion through Conditional Computation [51.23624575321469]
Self-supervised learning (SSL) features provide lightweight and versatile multi-view speech representations.
SSL features conflict with traditional spectral features like FBanks in terms of update directions.
We propose a novel generalized feature fusion framework grounded in conditional computation.
arXiv Detail & Related papers (2025-01-14T12:12:06Z) - Towards Generative Class Prompt Learning for Fine-grained Visual Recognition [5.633314115420456]
Generative Class Prompt Learning and Contrastive Multi-class Prompt Learning are presented.
Generative Class Prompt Learning improves visio-linguistic synergy in class embeddings by conditioning on few-shot exemplars with learnable class prompts.
CoMPLe builds on this foundation by introducing a contrastive learning component that encourages inter-class separation.
arXiv Detail & Related papers (2024-09-03T12:34:21Z) - Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning [114.59476118365266]
We propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment.
AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision.
arXiv Detail & Related papers (2024-06-05T07:59:48Z) - Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models [81.71651422951074]
Chain-of-Spot (CoS) method is a novel approach that enhances feature extraction by focusing on key regions of interest.
This technique allows LVLMs to access more detailed visual information without altering the original image resolution.
Our empirical findings demonstrate a significant improvement in LVLMs' ability to understand and reason about visual content.
arXiv Detail & Related papers (2024-03-19T17:59:52Z) - CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts [11.752632557524969]
We propose contrastive learning with data augmentation to disentangle content features from the original representations.
Our experiments across diverse datasets demonstrate significant improvements in zero-shot and few-shot classification tasks.
arXiv Detail & Related papers (2023-11-28T03:00:59Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - FREE: Feature Refinement for Generalized Zero-Shot Learning [86.41074134041394]
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias.
Most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks.
We propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE) to tackle the above problem.
arXiv Detail & Related papers (2021-07-29T08:11:01Z) - Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder
with Semantic Concepts [0.9054540533394924]
Recent techniques try to learn a cross-modal mapping between the semantic space and the image space.
We propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space.
Our results show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.
arXiv Detail & Related papers (2021-06-26T20:08:37Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Cross Knowledge-based Generative Zero-Shot Learning Approach with
Taxonomy Regularization [5.280368849852332]
We develop a generative network-based ZSL approach equipped with the proposed Cross Knowledge Learning (CKL) scheme and Taxonomy Regularization (TR)
CKL enables more relevant semantic features to be trained for semantic-to-visual feature embedding in ZSL.
TR significantly improves the intersections with unseen images with more generalized visual features generated from generative network.
arXiv Detail & Related papers (2021-01-25T04:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.