Long Horizon Temperature Scaling
- URL: http://arxiv.org/abs/2302.03686v2
- Date: Fri, 29 Sep 2023 18:44:40 GMT
- Title: Long Horizon Temperature Scaling
- Authors: Andy Shih, Dorsa Sadigh, Stefano Ermon
- Abstract summary: Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions.
We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
- Score: 90.03310732189543
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temperature scaling is a popular technique for tuning the sharpness of a
model distribution. It is used extensively for sampling likely generations and
calibrating model uncertainty, and even features as a controllable parameter to
many large language models in deployment. However, autoregressive models rely
on myopic temperature scaling that greedily optimizes the next token. To
address this, we propose Long Horizon Temperature Scaling (LHTS), a novel
approach for sampling from temperature-scaled joint distributions. LHTS is
compatible with all likelihood-based models, and optimizes for the long horizon
likelihood of samples. We derive a temperature-dependent LHTS objective, and
show that finetuning a model on a range of temperatures produces a single model
capable of generation with a controllable long horizon temperature parameter.
We experiment with LHTS on image diffusion models and character/language
autoregressive models, demonstrating advantages over myopic temperature scaling
in likelihood and sample quality, and showing improvements in accuracy on a
multiple choice analogy task by $10\%$.
Related papers
- Adaptive Decoding via Latent Preference Optimization [55.70602730588745]
We introduce Adaptive Decoding, a layer added to the model to select the sampling temperature dynamically at inference time.
Our method outperforms all fixed decoding temperatures across a range of tasks that require different temperatures.
arXiv Detail & Related papers (2024-11-14T18:31:39Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Deep generative modelling of canonical ensemble with differentiable thermal properties [0.9421843976231371]
We propose a variational modelling method with differentiable temperature for canonical ensembles.
Using a deep generative model, the free energy is estimated and minimized simultaneously in a continuous temperature range.
The training process requires no dataset, and works with arbitrary explicit density generative models.
arXiv Detail & Related papers (2024-04-29T03:41:49Z) - EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling [31.663507929452564]
We propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method to balance generation quality and diversity.
Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
arXiv Detail & Related papers (2024-03-21T16:41:12Z) - Temperature dependence of energy transport in the $\mathbb{Z}_3$ chiral clock model [0.0]
We study energy transport within the non-integrable regime of the one-dimensional $mathbbZ_3$ chiral clock model.
We extract the transport coefficients of the model at relatively high temperatures above both its gapless and gapped low-temperature phases.
Although we are not yet able to reach temperatures where quantum critical scaling would be observed, our approach is able to access the transport properties of the model.
arXiv Detail & Related papers (2023-10-31T18:00:30Z) - Capturing Local Temperature Evolution during Additive Manufacturing
through Fourier Neural Operators [0.0]
This paper presents a data-driven model that captures the local temperature evolution during the additive manufacturing process.
It is tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process.
The results demonstrate that the model achieves high fidelity as measured by $R2$ and maintains generalizability to geometries that were not included in the training process.
arXiv Detail & Related papers (2023-07-04T16:17:59Z) - Bi-Noising Diffusion: Towards Conditional Diffusion Models with
Generative Restoration Priors [64.24948495708337]
We introduce a new method that brings predicted samples to the training data manifold using a pretrained unconditional diffusion model.
We perform comprehensive experiments to demonstrate the effectiveness of our approach on super-resolution, colorization, turbulence removal, and image-deraining tasks.
arXiv Detail & Related papers (2022-12-14T17:26:35Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven
Model Interpretability Applied to the Ironmaking Industry [70.10343492784465]
It is necessary to expose to the process engineer, not solely the model predictions, but also their interpretability.
Model-agnostic local interpretability solutions based on LIME have recently emerged to improve the original method.
We present in this paper a novel approach, VAE-LIME, for local interpretability of data-driven models forecasting the temperature of the hot metal produced by a blast furnace.
arXiv Detail & Related papers (2020-07-15T07:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.