Related papers: Optimizing Temperature for Language Models with Multi-Sample Inference

Optimizing Temperature for Language Models with Multi-Sample Inference

URL: http://arxiv.org/abs/2502.05234v1
Date: Fri, 07 Feb 2025 19:35:25 GMT
Title: Optimizing Temperature for Language Models with Multi-Sample Inference
Authors: Weihua Du, Yiming Yang, Sean Welleck,
Abstract summary: This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different large language models.<n>We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy.<n>We propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines.
Score: 47.14991144052361
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-sample aggregation strategies, such as majority voting and best-of-N sampling, are widely used in contemporary large language models (LLMs) to enhance predictive accuracy across various tasks. A key challenge in this process is temperature selection, which significantly impacts model performance. Existing approaches either rely on a fixed default temperature or require labeled validation data for tuning, which are often scarce and difficult to obtain. This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different LLMs using multi-sample aggregation strategies, without relying on task-specific validation data. We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy. Furthermore, we propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines. Additionally, we incorporate a stochastic process model to enhance interpretability, offering deeper insights into the relationship between temperature and model performance.

Related papers

Exploring the Impact of Temperature on Large Language Models:Hot or Cold? [9.70280446429164]
We evaluate the impact of temperature in the range of 0 to 2 on data sets designed to assess six different capabilities.<n>Our findings reveal skill-specific effects of temperature on model performance, highlighting the complexity of optimal temperature selection.<n>We propose a BERT-based temperature selector that takes advantage of these observed effects to identify the optimal temperature for a given prompt.
arXiv Detail & Related papers (2025-06-08T21:36:26Z)
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting [64.45587649141842]
Time-series forecasting plays a critical role in many real-world applications.<n>No single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases.<n>We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models.
arXiv Detail & Related papers (2025-05-24T00:45:07Z)
Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods [1.3892342684177872]
We propose a robust sampling strategy that eliminates the need for temperature calibration. MCT provides more robust uncertainty estimates across a wide range of temperatures. MCT achieves statistical parity with oracle temperatures, which represent the ideal outcome of a well-tuned but computationally expensive HPO process.
arXiv Detail & Related papers (2025-02-25T17:33:20Z)
Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time. Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling [31.663507929452564]
We propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method to balance generation quality and diversity. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
arXiv Detail & Related papers (2024-03-21T16:41:12Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method. We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z)
Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL) We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking. We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z)
Capturing Local Temperature Evolution during Additive Manufacturing through Fourier Neural Operators [0.0]
This paper presents a data-driven model that captures the local temperature evolution during the additive manufacturing process. It is tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process. The results demonstrate that the model achieves high fidelity as measured by $R2$ and maintains generalizability to geometries that were not included in the training process.
arXiv Detail & Related papers (2023-07-04T16:17:59Z)
Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization [51.41175648612714]
We propose a new robust contrastive loss inspired by distributionally robust optimization (DRO) We show that our algorithm automatically learns a suitable $tau$ for each sample. Our method outperforms prior strong baselines on unimodal and bimodal datasets.
arXiv Detail & Related papers (2023-05-19T19:25:56Z)
Long Horizon Temperature Scaling [90.03310732189543]
Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions. We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
arXiv Detail & Related papers (2023-02-07T18:59:32Z)
Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations. We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.