Related papers: Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

URL: http://arxiv.org/abs/2106.14082v1
Date: Sat, 26 Jun 2021 20:08:37 GMT
Title: Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts
Authors: Nihar Bendre, Kevin Desai and Peyman Najafirad
Abstract summary: Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. We propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. Our results show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.
Score: 0.9054540533394924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning showcase the ability to learn information about novel classes based on prior knowledge. Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. However, they tend to ignore the local and global semantic knowledge. To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. In our approach we concatenate multimodal data to a single embedding before passing it to the VAE for learning the latent space. We propose the use of a multi-modal loss during the reconstruction of the feature embedding through the decoder. Our approach is capable to correlating modalities and exploit the local and global semantic knowledge for novel sample predictions. Our experimental results using a MLP classifier on four benchmark datasets show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.

Related papers

A Zero-shot Learning Method Based on Large Language Models for Multi-modal Knowledge Graph Embedding [8.56384109338971]
Zero-shot learning (ZL) is crucial for tasks involving unseen categories, such as natural language processing, image classification, and cross-lingual transfer. We proposeZSLLM, a framework for zero-shot embedding learning of MMKGs using largelanguage models (LLMs)
arXiv Detail & Related papers (2025-03-10T11:38:21Z)
Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods. MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections. Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z)
Reinforcement Learning Based Multi-modal Feature Fusion Network for Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans. We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information. We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z)
Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings. We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space. We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities. The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.