A Framework to Enhance Generalization of Deep Metric Learning methods
using General Discriminative Feature Learning and Class Adversarial Neural
Networks
- URL: http://arxiv.org/abs/2106.06420v1
- Date: Fri, 11 Jun 2021 14:24:40 GMT
- Title: A Framework to Enhance Generalization of Deep Metric Learning methods
using General Discriminative Feature Learning and Class Adversarial Neural
Networks
- Authors: Karrar Al-Kaabi, Reza Monsefi, Davood Zabihzadeh
- Abstract summary: Metric learning algorithms aim to learn a distance function that brings semantically similar data items together and keeps dissimilar ones at a distance.
Deep Metric Learning (DML) methods are proposed that automatically extract features from data and learn a non-linear transformation from input space to a semantically embedding space.
We propose a framework to enhance the generalization power of existing DML methods in a Zero-Shot Learning (ZSL) setting.
- Score: 1.5469452301122175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Metric learning algorithms aim to learn a distance function that brings the
semantically similar data items together and keeps dissimilar ones at a
distance. The traditional Mahalanobis distance learning is equivalent to find a
linear projection. In contrast, Deep Metric Learning (DML) methods are proposed
that automatically extract features from data and learn a non-linear
transformation from input space to a semantically embedding space. Recently,
many DML methods are proposed focused to enhance the discrimination power of
the learned metric by providing novel sampling strategies or loss functions.
This approach is very helpful when both the training and test examples are
coming from the same set of categories. However, it is less effective in many
applications of DML such as image retrieval and person-reidentification. Here,
the DML should learn general semantic concepts from observed classes and employ
them to rank or identify objects from unseen categories. Neglecting the
generalization ability of the learned representation and just emphasizing to
learn a more discriminative embedding on the observed classes may lead to the
overfitting problem. To address this limitation, we propose a framework to
enhance the generalization power of existing DML methods in a Zero-Shot
Learning (ZSL) setting by general yet discriminative representation learning
and employing a class adversarial neural network. To learn a more general
representation, we propose to employ feature maps of intermediate layers in a
deep neural network and enhance their discrimination power through an attention
mechanism. Besides, a class adversarial network is utilized to enforce the deep
model to seek class invariant features for the DML task. We evaluate our work
on widely used machine vision datasets in a ZSL setting.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Deep Metric Learning for Computer Vision: A Brief Overview [4.980117530293724]
Objective functions that optimize deep neural networks play a vital role in creating an enhanced feature representation of the input data.
Deep Metric Learning seeks to develop methods that aim to measure the similarity between data samples.
We will provide an overview of recent progress in this area and discuss state-of-the-art Deep Metric Learning approaches.
arXiv Detail & Related papers (2023-12-01T21:53:36Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Temporal Feature Alignment in Contrastive Self-Supervised Learning for
Human Activity Recognition [2.2082422928825136]
Self-supervised learning is typically used to learn deep feature representations from unlabeled data.
We propose integrating a dynamic time warping algorithm in a latent space to force features to be aligned in a temporal dimension.
The proposed approach has a great potential in learning robust feature representations compared to the recent SSL baselines.
arXiv Detail & Related papers (2022-10-07T07:51:01Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space.
These spaces should be transferable to classes beyond those seen during training.
This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z) - Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder
with Semantic Concepts [0.9054540533394924]
Recent techniques try to learn a cross-modal mapping between the semantic space and the image space.
We propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space.
Our results show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.
arXiv Detail & Related papers (2021-06-26T20:08:37Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.