Integrating Language Guidance into Vision-based Deep Metric Learning
- URL: http://arxiv.org/abs/2203.08543v1
- Date: Wed, 16 Mar 2022 11:06:50 GMT
- Title: Integrating Language Guidance into Vision-based Deep Metric Learning
- Authors: Karsten Roth, Oriol Vinyals, Zeynep Akata
- Abstract summary: We propose to learn metric spaces which encode semantic similarities as embedding space.
These spaces should be transferable to classes beyond those seen during training.
This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
- Score: 78.18860829585182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Metric Learning (DML) proposes to learn metric spaces which encode
semantic similarities as embedding space distances. These spaces should be
transferable to classes beyond those seen during training. Commonly, DML
methods task networks to solve contrastive ranking tasks defined over binary
class assignments. However, such approaches ignore higher-level semantic
relations between the actual classes. This causes learned embedding spaces to
encode incomplete semantic context and misrepresent the semantic relation
between classes, impacting the generalizability of the learned metric space. To
tackle this issue, we propose a language guidance objective for visual
similarity learning. Leveraging language embeddings of expert- and
pseudo-classnames, we contextualize and realign visual representation spaces
corresponding to meaningful language semantics for better semantic consistency.
Extensive experiments and ablations provide a strong motivation for our
proposed approach and show language guidance offering significant,
model-agnostic improvements for DML, achieving competitive and state-of-the-art
results on all benchmarks. Code available at
https://github.com/ExplainableML/LanguageGuidance_for_DML.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint [6.880579537300643]
Current disentangled representation learning methods suffer from semantic leakage.
We propose a novel training objective, ORthogonAlity Constraint LEarning (ORACLE)
ORACLE builds upon two components: intra-class clustering and inter-class separation.
We demonstrate that training with the ORACLE objective effectively reduces semantic leakage and enhances semantic alignment within the embedding space.
arXiv Detail & Related papers (2024-09-24T02:01:52Z) - MINERS: Multilingual Language Models as Semantic Retrievers [23.686762008696547]
This paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual language models in semantic retrieval tasks.
We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages.
Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-11T16:26:18Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Discovering Low-rank Subspaces for Language-agnostic Multilingual
Representations [38.56175462620892]
Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer.
We present a novel view of projecting away language-specific factors from a multilingual embedding space.
We show that applying our method consistently leads to improvements over commonly used ML-LMs.
arXiv Detail & Related papers (2024-01-11T09:54:11Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual
Spoken Language Understanding [74.39024160277809]
We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming.
Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance.
GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.
arXiv Detail & Related papers (2022-04-18T13:56:58Z) - A Framework to Enhance Generalization of Deep Metric Learning methods
using General Discriminative Feature Learning and Class Adversarial Neural
Networks [1.5469452301122175]
Metric learning algorithms aim to learn a distance function that brings semantically similar data items together and keeps dissimilar ones at a distance.
Deep Metric Learning (DML) methods are proposed that automatically extract features from data and learn a non-linear transformation from input space to a semantically embedding space.
We propose a framework to enhance the generalization power of existing DML methods in a Zero-Shot Learning (ZSL) setting.
arXiv Detail & Related papers (2021-06-11T14:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.