Selectively Hard Negative Mining for Alleviating Gradient Vanishing in
Image-Text Matching
- URL: http://arxiv.org/abs/2303.00181v1
- Date: Wed, 1 Mar 2023 02:15:07 GMT
- Title: Selectively Hard Negative Mining for Alleviating Gradient Vanishing in
Image-Text Matching
- Authors: Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Zhongtian Du
- Abstract summary: Most existing Image-Text Matching (ITM) models suffer from gradients vanishing at the beginning of training.
We propose a Selectively Hard Negative Mining (SelHN) strategy, which chooses whether to mine hard negative samples.
SelHN can be plug-and-play applied to existing ITM models to give them better training behavior.
- Score: 15.565068934153983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, a series of Image-Text Matching (ITM) methods achieve impressive
performance. However, we observe that most existing ITM models suffer from
gradients vanishing at the beginning of training, which makes these models
prone to falling into local minima. Most ITM models adopt triplet loss with
Hard Negative mining (HN) as the optimization objective. We find that
optimizing an ITM model using only the hard negative samples can easily lead to
gradient vanishing. In this paper, we derive the condition under which the
gradient vanishes during training. When the difference between the positive
pair similarity and the negative pair similarity is close to 0, the gradients
on both the image and text encoders will approach 0. To alleviate the gradient
vanishing problem, we propose a Selectively Hard Negative Mining (SelHN)
strategy, which chooses whether to mine hard negative samples according to the
gradient vanishing condition. SelHN can be plug-and-play applied to existing
ITM models to give them better training behavior. To further ensure the
back-propagation of gradients, we construct a Residual Visual Semantic
Embedding model with SelHN, denoted as RVSE++. Extensive experiments on two ITM
benchmarks demonstrate the strength of RVSE++, achieving state-of-the-art
performance.
Related papers
- Characterizing Model Robustness via Natural Input Gradients [37.97521090347974]
We show the surprising effectiveness of instead regularizing the gradient with respect to model inputs on natural examples only.
On ImageNet-1k, Gradient Norm training achieves > 90% the performance of state-of-the-art PGD-3 Adversarial Training (52% vs.56%), while using only 60% cost of the state-of-the-art without complex adversarial optimization.
arXiv Detail & Related papers (2024-09-30T09:41:34Z) - Learning from History: Task-agnostic Model Contrastive Learning for
Image Restoration [79.04007257606862]
This paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself.
Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks.
arXiv Detail & Related papers (2023-09-12T07:50:54Z) - The Equalization Losses: Gradient-Driven Training for Long-tailed Object
Recognition [84.51875325962061]
We propose a gradient-driven training mechanism to tackle the long-tail problem.
We introduce a new family of gradient-driven loss functions, namely equalization losses.
Our method consistently outperforms the baseline models.
arXiv Detail & Related papers (2022-10-11T16:00:36Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
We introduce a method, named GradViT, that optimize random noise into naturally looking images.
We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z) - On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient.
Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z) - Score-based diffusion models for accelerated MRI [35.3148116010546]
We introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging.
Our model requires magnitude images only for training, and yet is able to reconstruct complex-valued data, and even extends to parallel imaging.
arXiv Detail & Related papers (2021-10-08T08:42:03Z) - Unleashing the Power of Contrastive Self-Supervised Visual Models via
Contrast-Regularized Fine-Tuning [94.35586521144117]
We investigate whether applying contrastive learning to fine-tuning would bring further benefits.
We propose Contrast-regularized tuning (Core-tuning), a novel approach for fine-tuning contrastive self-supervised visual models.
arXiv Detail & Related papers (2021-02-12T16:31:24Z) - Sharpness-Aware Minimization for Efficiently Improving Generalization [36.87818971067698]
We introduce a novel, effective procedure for simultaneously minimizing loss value and loss sharpness.
Sharpness-Aware Minimization (SAM) seeks parameters that lie in neighborhoods having uniformly low loss.
We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets.
arXiv Detail & Related papers (2020-10-03T19:02:10Z) - SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner.
A key of success to such contrastive learning methods is how to draw positive and negative samples.
In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.