Rethinking Deep Contrastive Learning with Embedding Memory
- URL: http://arxiv.org/abs/2103.14003v1
- Date: Thu, 25 Mar 2021 17:39:34 GMT
- Title: Rethinking Deep Contrastive Learning with Embedding Memory
- Authors: Haozhi Zhang, Xun Wang, Weilin Huang, Matthew R. Scott
- Abstract summary: Pair-wise loss functions have been extensively studied and shown to continuously improve the performance of deep metric learning (DML)
We provide a new methodology for systematically studying weighting strategies of various pair-wise loss functions, and rethink pair weighting with an embedding memory.
- Score: 58.66613563148031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pair-wise loss functions have been extensively studied and shown to
continuously improve the performance of deep metric learning (DML). However,
they are primarily designed with intuition based on simple toy examples, and
experimentally identifying the truly effective design is difficult in
complicated, real-world cases. In this paper, we provide a new methodology for
systematically studying weighting strategies of various pair-wise loss
functions, and rethink pair weighting with an embedding memory. We delve into
the weighting mechanisms by decomposing the pair-wise functions, and study
positive and negative weights separately using direct weight assignment. This
allows us to study various weighting functions deeply and systematically via
weight curves, and identify a number of meaningful, comprehensive and
insightful facts, which come up with our key observation on memory-based DML:
it is critical to mine hard negatives and discard easy negatives which are less
informative and redundant, but weighting on positive pairs is not helpful. This
results in an efficient but surprisingly simple rule to design the weighting
scheme, making it significantly different from existing mini-batch based
methods which design various sophisticated loss functions to weight pairs
carefully. Finally, we conduct extensive experiments on three large-scale
visual retrieval benchmarks, and demonstrate the superiority of memory-based
DML over recent mini-batch based approaches, by using a simple contrastive loss
with momentum-updated memory.
Related papers
- Dissecting Misalignment of Multimodal Large Language Models via Influence Function [12.832792175138241]
We introduce the Extended Influence Function for Contrastive Loss (ECIF), an influence function crafted for contrastive loss.
ECIF considers both positive and negative samples and provides a closed-form approximation of contrastive learning models.
Building upon ECIF, we develop a series of algorithms for data evaluation in MLLM, misalignment detection, and misprediction trace-back tasks.
arXiv Detail & Related papers (2024-11-18T15:45:41Z) - Multi-Margin Cosine Loss: Proposal and Application in Recommender Systems [0.0]
Collaborative filtering-based deep learning techniques have regained popularity due to their straightforward nature.
These systems consist of three main components: an interaction module, a loss function, and a negative sampling strategy.
The proposed Multi-Margin Cosine Loss (MMCL) addresses these challenges by introducing multiple margins and varying weights for negative samples.
arXiv Detail & Related papers (2024-05-07T18:58:32Z) - When Measures are Unreliable: Imperceptible Adversarial Perturbations
toward Top-$k$ Multi-Label Learning [83.8758881342346]
A novel loss function is devised to generate adversarial perturbations that could achieve both visual and measure imperceptibility.
Experiments on large-scale benchmark datasets demonstrate the superiority of our proposed method in attacking the top-$k$ multi-label systems.
arXiv Detail & Related papers (2023-07-27T13:18:47Z) - Exploring the Equivalence of Siamese Self-Supervised Learning via A
Unified Gradient Framework [43.76337849044254]
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations.
Various works are proposed to deal with self-supervised learning from different perspectives.
We propose UniGrad, a simple but effective gradient form for self-supervised learning.
arXiv Detail & Related papers (2021-12-09T18:59:57Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - Multi-Loss Weighting with Coefficient of Variations [19.37721431024278]
We propose a weighting scheme based on the coefficient of variations and set the weights based on properties observed while training the model.
The proposed method incorporates a measure of uncertainty to balance the losses, and as a result the loss weights evolve during training without requiring another (learning based) optimisation.
The validity of the approach is shown empirically for depth estimation and semantic segmentation on multiple datasets.
arXiv Detail & Related papers (2020-09-03T14:51:19Z) - Multi-step Estimation for Gradient-based Meta-learning [3.4376560669160385]
We propose a simple yet straightforward method to reduce the cost by reusing the same gradient in a window of inner steps.
We show that our method significantly reduces training time and memory usage, maintaining competitive accuracies, or even outperforming in some cases.
arXiv Detail & Related papers (2020-06-08T00:37:01Z) - Learning and Memorizing Representative Prototypes for 3D Point Cloud
Semantic and Instance Segmentation [117.29799759864127]
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.
Deep networks can easily forget the non-dominant cases during the learning process, resulting in unsatisfactory performance.
We propose a memory-augmented network to learn and memorize the representative prototypes that cover diverse samples universally.
arXiv Detail & Related papers (2020-01-06T01:07:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.