Related papers: Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness

Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness

URL: http://arxiv.org/abs/2502.20604v1
Date: Fri, 28 Feb 2025 00:07:45 GMT
Title: Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness
Authors: Hao Xuan, Bokai Yang, Xingyu Li,
Abstract summary: This study delves into the often-overlooked parameter within the softmax function, known as "temperature"<n>Our empirical studies, adopting convolutional neural networks and transformers, reveal that moderate temperatures generally introduce better overall performance.<n>For the first time, we discover a surprising benefit of elevated temperatures: enhanced model robustness against common corruption, natural perturbation, and non-targeted adversarial attacks like Projected Gradient Descent.
Score: 8.934328206473456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The softmax function is a fundamental component in deep learning. This study delves into the often-overlooked parameter within the softmax function, known as "temperature," providing novel insights into the practical and theoretical aspects of temperature scaling for image classification. Our empirical studies, adopting convolutional neural networks and transformers on multiple benchmark datasets, reveal that moderate temperatures generally introduce better overall performance. Through extensive experiments and rigorous theoretical analysis, we explore the role of temperature scaling in model training and unveil that temperature not only influences learning step size but also shapes the model's optimization direction. Moreover, for the first time, we discover a surprising benefit of elevated temperatures: enhanced model robustness against common corruption, natural perturbation, and non-targeted adversarial attacks like Projected Gradient Descent. We extend our discoveries to adversarial training, demonstrating that, compared to the standard softmax function with the default temperature value, higher temperatures have the potential to enhance adversarial training. The insights of this work open new avenues for improving model performance and security in deep learning applications.

Related papers

Leveraging Advanced Machine Learning to Predict Turbulence Dynamics from Temperature Observations at an Experimental Prescribed Fire [0.0]
This study explores the potential for predicting turbulent kinetic energy (TKE) from more readily acquired temperature data.<n>Machine learning models were employed to assess the potential to predict TKE from temperature perturbations.<n>The results demonstrate significant success, particularly from regression models, in accurately predicting the TKE.
arXiv Detail & Related papers (2025-07-15T06:07:14Z)
Exploring the Impact of Temperature on Large Language Models:Hot or Cold? [9.70280446429164]
We evaluate the impact of temperature in the range of 0 to 2 on data sets designed to assess six different capabilities.<n>Our findings reveal skill-specific effects of temperature on model performance, highlighting the complexity of optimal temperature selection.<n>We propose a BERT-based temperature selector that takes advantage of these observed effects to identify the optimal temperature for a given prompt.
arXiv Detail & Related papers (2025-06-08T21:36:26Z)
Unpacking Softmax: How Temperature Drives Representation Collapse, Compression, and Generalization [15.458541841436967]
We study the pivotal role of the softmax function in shaping the model's representation.<n>We introduce the concept of rank deficit bias - a phenomenon in which softmax-based deep networks find solutions of rank much lower than the number of classes.<n>We demonstrate how to exploit the softmax dynamics to learn compressed representations or to enhance their performance on out-of-distribution data.
arXiv Detail & Related papers (2025-06-02T11:38:10Z)
Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification [0.0]
In deep learning-based classification tasks, the temperature parameter $T$ critically influences the output distribution and overall performance. This study presents a novel theoretical insight that the optimal temperature $T*$ is uniquely determined by the dimensionality of the feature representations. We develop an empirical formula to estimate $T*$ without additional training while also introducing a corrective scheme to refine $T*$ based on the number of classes and task complexity.
arXiv Detail & Related papers (2025-04-22T05:14:38Z)
A Multimodal Physics-Informed Neural Network Approach for Mean Radiant Temperature Modeling [0.0]
This study introduces a Physics-Informed Neural Network (PINN) approach that integrates shortwave and longwave radiation modeling with deep learning techniques. By leveraging a multimodal dataset that includes meteorological data, built environment characteristics, and fisheye image-derived shading information, our model enhances predictive accuracy while maintaining physical consistency.
arXiv Detail & Related papers (2025-03-11T14:36:08Z)
Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z)
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO [68.69840111477367]
We present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs. Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models.
arXiv Detail & Related papers (2024-04-06T09:55:03Z)
Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness. We show that this observed gain in robustness is an illusion of robustness (IOR) We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z)
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method. We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z)
Exploring and Analyzing Wildland Fire Data Via Machine Learning Techniques [0.0]
This research project investigated the correlation between a 10 Hz time series of thermocouple temperatures and turbulent kinetic energy (TKE) Wind speeds were collected from a small experimental prescribed burn at the Silas Little Experimental Forest in New Jersey, USA. The project achieves high accuracy in predicting TKE by employing various machine learning models.
arXiv Detail & Related papers (2023-11-09T03:47:49Z)
Machine learning enabled experimental design and parameter estimation for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED) Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED. Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z)
A Three-regime Model of Network Pruning [47.92525418773768]
We use temperature-like and load-like parameters to model the impact of neural network (NN) training hyper parameters on pruning performance. A key empirical result we identify is a sharp transition phenomenon: depending on the value of a load-like parameter in the pruned model, increasing the value of a temperature-like parameter in the pre-pruned model may either enhance or impair subsequent pruning performance. Our model reveals that the dichotomous effect of high temperature is associated with transitions between distinct types of global structures in the post-pruned model.
arXiv Detail & Related papers (2023-05-28T08:09:25Z)
Physics-constrained deep learning postprocessing of temperature and humidity [0.0]
We propose to achieve physical consistency in deep learning-based postprocessing models. We find that constraining a neural network to enforce thermodynamic state equations yields physically-consistent predictions.
arXiv Detail & Related papers (2022-12-07T09:31:25Z)
Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations. We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z)
Temperature check: theory and practice for training models with softmax-cross-entropy losses [21.073524360170833]
We develop a theory of early learning for models trained with softmax-cross-entropy loss. We find that generalization performance depends strongly on the temperature, but only weakly on the initial logit magnitude.
arXiv Detail & Related papers (2020-10-14T18:26:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.