Mathematical Justification of Hard Negative Mining via Isometric
Approximation Theorem
- URL: http://arxiv.org/abs/2210.11173v1
- Date: Thu, 20 Oct 2022 11:21:17 GMT
- Title: Mathematical Justification of Hard Negative Mining via Isometric
Approximation Theorem
- Authors: Albert Xu, Jhih-Yi Hsieh, Bhaskar Vundurthy, Eliana Cohen, Howie
Choset, Lu Li
- Abstract summary: In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks.
One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point.
While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success.
- Score: 18.525368151998386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In deep metric learning, the Triplet Loss has emerged as a popular method to
learn many computer vision and natural language processing tasks such as facial
recognition, object detection, and visual-semantic embeddings. One issue that
plagues the Triplet Loss is network collapse, an undesirable phenomenon where
the network projects the embeddings of all data onto a single point.
Researchers predominately solve this problem by using triplet mining
strategies. While hard negative mining is the most effective of these
strategies, existing formulations lack strong theoretical justification for
their empirical success. In this paper, we utilize the mathematical theory of
isometric approximation to show an equivalence between the Triplet Loss sampled
by hard negative mining and an optimization problem that minimizes a
Hausdorff-like distance between the neural network and its ideal counterpart
function. This provides the theoretical justifications for hard negative
mining's empirical efficacy. In addition, our novel application of the
isometric approximation theorem provides the groundwork for future forms of
hard negative mining that avoid network collapse. Our theory can also be
extended to analyze other Euclidean space-based metric learning methods like
Ladder Loss or Contrastive Learning.
Related papers
- Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
Graph Neural Networks (GNNs) use relational information as an inductive bias to enhance the model's accuracy.
As task-relevant relations might be unknown, graph structure learning approaches have been proposed to learn them while solving the downstream prediction task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Mean Field Theory in Deep Metric Learning [0.0]
We develop an approach to design classification-based loss functions from pair-based ones.
We derive two new loss functions, MeanFieldContrastive and MeanFieldClassWiseMultiSimilarity losses, with reduced training complexity.
We extensively evaluate these derived loss functions on three image-retrieval datasets.
arXiv Detail & Related papers (2023-06-27T10:33:37Z) - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization
Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks.
We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths.
Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv Detail & Related papers (2023-06-20T22:10:40Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - On the Minimal Adversarial Perturbation for Deep Neural Networks with
Provable Estimation Error [65.51757376525798]
The existence of adversarial perturbations has opened an interesting research line on provable robustness.
No provable results have been presented to estimate and bound the error committed.
This paper proposes two lightweight strategies to find the minimal adversarial perturbation.
The obtained results show that the proposed strategies approximate the theoretical distance and robustness for samples close to the classification, leading to provable guarantees against any adversarial attacks.
arXiv Detail & Related papers (2022-01-04T16:40:03Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - LoOp: Looking for Optimal Hard Negative Embeddings for Deep Metric
Learning [17.571160136568455]
We propose a novel approach that looks for optimal hard negatives (LoOp) in the embedding space.
Unlike mining-based methods, our approach considers the entire space between pairs of embeddings to calculate the optimal hard negatives.
arXiv Detail & Related papers (2021-08-20T19:21:33Z) - Optimization and Generalization of Regularization-Based Continual
Learning: a Loss Approximation Viewpoint [35.5156045701898]
We provide a novel viewpoint of regularization-based continual learning by formulating it as a second-order Taylor approximation of the loss function of each task.
Based on this viewpoint, we study the optimization aspects (i.e., convergence) as well as generalization properties (i.e., finite-sample guarantees) of regularization-based continual learning.
arXiv Detail & Related papers (2020-06-19T06:08:40Z) - Towards Certified Robustness of Distance Metric Learning [53.96113074344632]
We advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms.
We show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness.
arXiv Detail & Related papers (2020-06-10T16:51:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.