Comparing Contrastive and Triplet Loss: Variance Analysis and Optimization Behavior
- URL: http://arxiv.org/abs/2510.02161v2
- Date: Mon, 06 Oct 2025 05:19:04 GMT
- Title: Comparing Contrastive and Triplet Loss: Variance Analysis and Optimization Behavior
- Authors: Donghuo Zeng,
- Abstract summary: We show that triplet loss preserves greater variance within and across classes, supporting finer-grained distinctions in the learned representations.<n>In contrast, contrastive loss tends to compact intra-class embeddings, which may obscure subtle semantic differences.<n>We find that contrastive loss drives many small updates early on, while triplet loss produces fewer but stronger updates that sustain learning on hard examples.
- Score: 2.608092703580602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive loss and triplet loss are widely used objectives in deep metric learning, yet their effects on representation quality remain insufficiently understood. We present a theoretical and empirical comparison of these losses, focusing on intra- and inter-class variance and optimization behavior (e.g., greedy updates). Through task-specific experiments with consistent settings on synthetic data and real datasets-MNIST, CIFAR-10-it is shown that triplet loss preserves greater variance within and across classes, supporting finer-grained distinctions in the learned representations. In contrast, contrastive loss tends to compact intra-class embeddings, which may obscure subtle semantic differences. To better understand their optimization dynamics, By examining loss-decay rate, active ratio, and gradient norm, we find that contrastive loss drives many small updates early on, while triplet loss produces fewer but stronger updates that sustain learning on hard examples. Finally, across both classification and retrieval tasks on MNIST, CIFAR-10, CUB-200, and CARS196 datasets, our results consistently show that triplet loss yields superior performance, which suggests using triplet loss for detail retention and hard-sample focus, and contrastive loss for smoother, broad-based embedding refinement.
Related papers
- Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z) - Variance & Greediness: A comparative study of metric-learning losses [5.102429604787588]
Metric learning is central to retrieval, yet its effects on embedding geometry and optimization dynamics are not well understood.<n>We introduce a diagnostic framework, VARIANCE (intra-/inter-class variance) and GREEDINESS (active ratio and gradient norms) to compare seven representative losses.<n>Our analysis reveals that Triplet and SCL preserve higher within-class variance and clearer inter-class margins, leading to stronger top-1 retrieval in fine-grained settings.
arXiv Detail & Related papers (2026-01-29T09:28:30Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - Tuned Contrastive Learning [77.67209954169593]
We propose a novel contrastive loss function -- Tuned Contrastive Learning (TCL) loss.
TCL generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives.
We show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods.
arXiv Detail & Related papers (2023-05-18T03:26:37Z) - Unified Loss of Pair Similarity Optimization for Vision-Language
Retrieval [35.141916376979836]
There are two popular loss functions used for vision-language retrieval, i.e., triplet loss and contrastive learning loss.
This paper proposes a unified loss of pair similarity optimization for vision-language retrieval.
arXiv Detail & Related papers (2022-09-28T07:01:22Z) - Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic
Choices [37.559461866831754]
We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem.
We highlight the essential choices such as positive sampling rate, regularization, normalization/activation, and weights.
Our findings show that although Adam-type method is more competitive from training perspective, but it does not outperform others from testing perspective.
arXiv Detail & Related papers (2022-03-27T00:47:00Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - Label Distributionally Robust Losses for Multi-class Classification:
Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification.
Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification.
We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - A Decidability-Based Loss Function [2.5919311269669003]
Biometric problems often use deep learning models to extract features from images, also known as embeddings.
In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine.
The proposed approach is compared against the Softmax (cross-entropy), Triplets Soft-Hard, and the Multi Similarity losses in four different benchmarks.
arXiv Detail & Related papers (2021-09-12T14:26:27Z) - Class Interference Regularization [7.248447600071719]
Contrastive losses yield state-of-the-art performance for person re-identification, face verification and few shot learning.
We propose a novel, simple and effective regularization technique, the Class Interference Regularization (CIR)
CIR perturbs the output features by randomly moving them towards the average embeddings of the negative classes.
arXiv Detail & Related papers (2020-09-04T21:03:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.