The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
- URL: http://arxiv.org/abs/2602.14862v1
- Date: Mon, 16 Feb 2026 15:54:52 GMT
- Title: The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling
- Authors: Pierre-Alexandre Mattei, Bruno Loureiro,
- Abstract summary: We show that increasing the temperature increases the uncertainty in the model in a very general sense.<n>For LLMs, we challenge the common claim that increasing temperature increases diversity.<n>We introduce two new characterisations of temperature scaling.
- Score: 22.839278056856433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: improving the calibration of classifiers and tuning the stochasticity of large language models (LLMs). In both cases, temperature scaling is the most popular method for the job. Despite its popularity, a rigorous theoretical analysis of the properties of temperature scaling has remained elusive. We investigate here some of these properties. For classification, we show that increasing the temperature increases the uncertainty in the model in a very general sense (and in particular increases its entropy). However, for LLMs, we challenge the common claim that increasing temperature increases diversity. Furthermore, we introduce two new characterisations of temperature scaling. The first one is geometric: the tempered model is shown to be the information projection of the original model onto the set of models with a given entropy. The second characterisation clarifies the role of temperature scaling as a submodel of more general linear scalers such as matrix scaling and Dirichlet calibration: we show that temperature scaling is the only linear scaler that does not change the hard predictions of the model.
Related papers
- On the Entropy Calibration of Language Models [52.47557449370603]
We study the problem of entropy calibration, which asks whether a language model's entropy over generations matches its log loss on human text.<n>We find that the observed scaling behavior is similar to what is predicted by the simplified setting.<n>We prove that it is possible, if we assume access to a black box which can fit models to predict the future entropy of text.
arXiv Detail & Related papers (2025-11-15T00:33:03Z) - Machine Learning for Electron-Scale Turbulence Modeling in W7-X [35.18016233072556]
This paper presents machine-learning-driven reduced models for turbulence in the Wendelstein 7-X stellarator.<n>Each model predicts the ETG heat flux as a function of three plasma parameters.<n>Our models demonstrate robust performance and predictive accuracy comparable to the original reference simulations.
arXiv Detail & Related papers (2025-11-06T17:24:37Z) - On the Role of Temperature Sampling in Test-Time Scaling [5.758728541863352]
We show that at large K, further scaling yields no gains, and certain hard questions remain unsolved regardless of the number of traces.<n>Averaged over Qwen3 and five representative reasoning benchmarks, temperature scaling yields an additional 7.3 points over single-temperature TTS.<n>Temperature scaling also enables base models to reach performance comparable to reinforcement learning (RL)-trained counterparts, without additional post-training.
arXiv Detail & Related papers (2025-10-02T23:09:56Z) - Exploring the Impact of Temperature on Large Language Models:Hot or Cold? [9.70280446429164]
We evaluate the impact of temperature in the range of 0 to 2 on data sets designed to assess six different capabilities.<n>Our findings reveal skill-specific effects of temperature on model performance, highlighting the complexity of optimal temperature selection.<n>We propose a BERT-based temperature selector that takes advantage of these observed effects to identify the optimal temperature for a given prompt.
arXiv Detail & Related papers (2025-06-08T21:36:26Z) - Extended string-net models with all anyons at finite temperature [0.0]
In the original string-net model, the description of charge excitations can be problematic.<n>We compute the spectral degeneracies of excited states and obtain the exact partition function.<n>In a finite-size system, order survives up to a finite temperature, revealing a nontrivial scaling between temperature and size.
arXiv Detail & Related papers (2025-02-03T15:43:19Z) - Scaling Laws in Linear Regression: Compute, Parameters, and Data [86.48154162485712]
We study the theory of scaling laws in an infinite dimensional linear regression setup.<n>We show that the reducible part of the test error is $Theta(-(a-1) + N-(a-1)/a)$.<n>Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation.
arXiv Detail & Related papers (2024-06-12T17:53:29Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Long Horizon Temperature Scaling [90.03310732189543]
Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions.
We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
arXiv Detail & Related papers (2023-02-07T18:59:32Z) - Uhlmann Fidelity and Fidelity Susceptibility for Integrable Spin Chains
at Finite Temperature: Exact Results [68.8204255655161]
We show that the proper inclusion of the odd parity subspace leads to the enhancement of maximal fidelity susceptibility in the intermediate range of temperatures.
The correct low-temperature behavior is captured by an approximation involving the two lowest many-body energy eigenstates.
arXiv Detail & Related papers (2021-05-11T14:08:02Z) - Parameterized Temperature Scaling for Boosting the Expressive Power in
Post-Hoc Uncertainty Calibration [57.568461777747515]
We introduce a novel calibration method, Parametrized Temperature Scaling (PTS)
We demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power.
We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
arXiv Detail & Related papers (2021-02-24T10:18:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.