Temperature-Free Loss Function for Contrastive Learning
- URL: http://arxiv.org/abs/2501.17683v1
- Date: Wed, 29 Jan 2025 14:43:21 GMT
- Title: Temperature-Free Loss Function for Contrastive Learning
- Authors: Bum Jun Kim, Sang Woo Kim,
- Abstract summary: We propose a novel method to deploy InfoNCE loss without temperature.
Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss.
The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.
- Score: 7.229820415732795
- License:
- Abstract: As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.
Related papers
- Optimizing YOLOv5s Object Detection through Knowledge Distillation algorithm [37.37311465537091]
This paper explores the application of knowledge distillation technology in target detection tasks.
By using YOLOv5l as the teacher network and a smaller YOLOv5s as the student network, we found that with the increase of distillation temperature, the student's detection accuracy gradually improved.
arXiv Detail & Related papers (2024-10-16T05:58:08Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - Dynamically Scaled Temperature in Self-Supervised Contrastive Learning [11.133502139934437]
We focus on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function.
Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
arXiv Detail & Related papers (2023-08-02T13:31:41Z) - Curriculum Temperature for Knowledge Distillation [30.94721463833605]
We propose a curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD)
CTKD controls the task difficulty level during the student's learning career through a dynamic and learnable temperature.
As an easy-to-use plug-in technique, CTKD can be seamlessly integrated into existing knowledge distillation frameworks.
arXiv Detail & Related papers (2022-11-29T14:10:35Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Adaptive Temperature Scaling for Robust Calibration of Deep Neural
Networks [0.7219077740523682]
We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling.
We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited.
We propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy.
arXiv Detail & Related papers (2022-07-31T16:20:06Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Temperature as Uncertainty in Contrastive Learning [5.8927489390473164]
We propose a simple way to generate uncertainty scores for contrastive methods by re-purposing temperature.
We call this approach "Temperature as Uncertainty", or TaU.
In summary, TaU is a simple yet versatile method for generating uncertainties for contrastive learning.
arXiv Detail & Related papers (2021-10-08T23:08:30Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.