Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness
- URL: http://arxiv.org/abs/2502.20604v1
- Date: Fri, 28 Feb 2025 00:07:45 GMT
- Title: Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness
- Authors: Hao Xuan, Bokai Yang, Xingyu Li,
- Abstract summary: This study delves into the often-overlooked parameter within the softmax function, known as "temperature"<n>Our empirical studies, adopting convolutional neural networks and transformers, reveal that moderate temperatures generally introduce better overall performance.<n>For the first time, we discover a surprising benefit of elevated temperatures: enhanced model robustness against common corruption, natural perturbation, and non-targeted adversarial attacks like Projected Gradient Descent.
- Score: 8.934328206473456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The softmax function is a fundamental component in deep learning. This study delves into the often-overlooked parameter within the softmax function, known as "temperature," providing novel insights into the practical and theoretical aspects of temperature scaling for image classification. Our empirical studies, adopting convolutional neural networks and transformers on multiple benchmark datasets, reveal that moderate temperatures generally introduce better overall performance. Through extensive experiments and rigorous theoretical analysis, we explore the role of temperature scaling in model training and unveil that temperature not only influences learning step size but also shapes the model's optimization direction. Moreover, for the first time, we discover a surprising benefit of elevated temperatures: enhanced model robustness against common corruption, natural perturbation, and non-targeted adversarial attacks like Projected Gradient Descent. We extend our discoveries to adversarial training, demonstrating that, compared to the standard softmax function with the default temperature value, higher temperatures have the potential to enhance adversarial training. The insights of this work open new avenues for improving model performance and security in deep learning applications.
Related papers
- Analytical Softmax Temperature Setting from Feature Dimensions for Model- and Domain-Robust Classification [0.0]
In deep learning-based classification tasks, the temperature parameter $T$ critically influences the output distribution and overall performance.
This study presents a novel theoretical insight that the optimal temperature $T*$ is uniquely determined by the dimensionality of the feature representations.
We develop an empirical formula to estimate $T*$ without additional training while also introducing a corrective scheme to refine $T*$ based on the number of classes and task complexity.
arXiv Detail & Related papers (2025-04-22T05:14:38Z) - A Multimodal Physics-Informed Neural Network Approach for Mean Radiant Temperature Modeling [0.0]
This study introduces a Physics-Informed Neural Network (PINN) approach that integrates shortwave and longwave radiation modeling with deep learning techniques.
By leveraging a multimodal dataset that includes meteorological data, built environment characteristics, and fisheye image-derived shading information, our model enhances predictive accuracy while maintaining physical consistency.
arXiv Detail & Related papers (2025-03-11T14:36:08Z) - Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time.
Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z) - To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO [68.69840111477367]
We present a principled framework for learning a small yet generalizable temperature prediction network (TempNet) to improve LFMs.
Our experiments on LLMs and CLIP models demonstrate that TempNet greatly improves the performance of existing solutions or models.
arXiv Detail & Related papers (2024-04-06T09:55:03Z) - Extreme Miscalibration and the Illusion of Adversarial Robustness [66.29268991629085]
Adversarial Training is often used to increase model robustness.
We show that this observed gain in robustness is an illusion of robustness (IOR)
We urge the NLP community to incorporate test-time temperature scaling into their robustness evaluations.
arXiv Detail & Related papers (2024-02-27T13:49:12Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - Exploring and Analyzing Wildland Fire Data Via Machine Learning
Techniques [0.0]
This research project investigated the correlation between a 10 Hz time series of thermocouple temperatures and turbulent kinetic energy (TKE)
Wind speeds were collected from a small experimental prescribed burn at the Silas Little Experimental Forest in New Jersey, USA.
The project achieves high accuracy in predicting TKE by employing various machine learning models.
arXiv Detail & Related papers (2023-11-09T03:47:49Z) - Machine learning enabled experimental design and parameter estimation
for ultrafast spin dynamics [54.172707311728885]
We introduce a methodology that combines machine learning with Bayesian optimal experimental design (BOED)
Our method employs a neural network model for large-scale spin dynamics simulations for precise distribution and utility calculations in BOED.
Our numerical benchmarks demonstrate the superior performance of our method in guiding XPFS experiments, predicting model parameters, and yielding more informative measurements within limited experimental time.
arXiv Detail & Related papers (2023-06-03T06:19:20Z) - A Three-regime Model of Network Pruning [47.92525418773768]
We use temperature-like and load-like parameters to model the impact of neural network (NN) training hyper parameters on pruning performance.
A key empirical result we identify is a sharp transition phenomenon: depending on the value of a load-like parameter in the pruned model, increasing the value of a temperature-like parameter in the pre-pruned model may either enhance or impair subsequent pruning performance.
Our model reveals that the dichotomous effect of high temperature is associated with transitions between distinct types of global structures in the post-pruned model.
arXiv Detail & Related papers (2023-05-28T08:09:25Z) - Physics-constrained deep learning postprocessing of temperature and
humidity [0.0]
We propose to achieve physical consistency in deep learning-based postprocessing models.
We find that constraining a neural network to enforce thermodynamic state equations yields physically-consistent predictions.
arXiv Detail & Related papers (2022-12-07T09:31:25Z) - Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations.
We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z) - Temperature check: theory and practice for training models with
softmax-cross-entropy losses [21.073524360170833]
We develop a theory of early learning for models trained with softmax-cross-entropy loss.
We find that generalization performance depends strongly on the temperature, but only weakly on the initial logit magnitude.
arXiv Detail & Related papers (2020-10-14T18:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.