Related papers: KL-Divergence Guided Temperature Sampling

KL-Divergence Guided Temperature Sampling

URL: http://arxiv.org/abs/2306.01286v2
Date: Wed, 29 Nov 2023 23:57:03 GMT
Title: KL-Divergence Guided Temperature Sampling
Authors: Chung-Ching Chang, David Reitter, Renat Aksitov, Yun-Hsuan Sung
Abstract summary: As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations. One common approach to mitigate hallucinations is to provide source/grounding documents. We propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source.
Score: 5.726259957909055
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Temperature sampling is a conventional approach to diversify large language model predictions. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual. One common approach to mitigate hallucinations is to provide source/grounding documents and the model is trained to produce predictions that bind to and are attributable to the provided source. It appears that there is a trade-off between diversity and attribution. To mitigate any such trade-off, we propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source through KL-divergence. Our experiments justifies the trade-off, and shows that our sampling algorithm outperforms the conventional top-k and top-p algorithms in conversational question-answering and summarization tasks.

Related papers

Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts [64.34482582690927]
We provide an efficient and principled method for sampling from a sequence of annealed, geometric-averaged, or product distributions derived from pretrained score-based models. We propose Sequential Monte Carlo (SMC) resampling algorithms that leverage inference-time scaling to improve sampling quality.
arXiv Detail & Related papers (2025-03-04T17:46:51Z)
Spatial Reasoning with Denoising Models [49.83744014336816]
We introduce a framework to perform reasoning over sets of continuous variables via denoising generative models. We demonstrate for the first time, that order of generation can successfully be predicted by the denoising network itself.
arXiv Detail & Related papers (2025-02-28T14:08:30Z)
Top-$nσ$: Not All Logits Are You Need [25.133593066927794]
We introduce top-$nsigma$, a novel sampling method that operates directly on pre-softmax logits. We show that top-$nsigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$nsigma$ to better understand its behavior.
arXiv Detail & Related papers (2024-11-12T08:46:43Z)
Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution. We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory. That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z)
Temperature Optimization for Bayesian Deep Learning [9.610060788662972]
We propose a data-driven approach to select the temperature that maximizes test log-predictive density. We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost.
arXiv Detail & Related papers (2024-10-08T07:32:22Z)
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z)
Bayesian Conditional Diffusion Models for Versatile Spatiotemporal Turbulence Generation [13.278744447861289]
We introduce a novel generative framework grounded in probabilistic diffusion models for turbulence generation. A notable feature of our approach is the proposed method for long-span flow sequence generation, which is based on autoregressive-based conditional sampling. We showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments.
arXiv Detail & Related papers (2023-11-14T04:08:14Z)
Dynamically Scaled Temperature in Self-Supervised Contrastive Learning [11.133502139934437]
We focus on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function. Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
arXiv Detail & Related papers (2023-08-02T13:31:41Z)
A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE. We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z)
ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion Trajectories [144.03939123870416]
We propose a novel conditional diffusion model by introducing conditions into the forward process. We use extra latent space to allocate an exclusive diffusion trajectory for each condition based on some shifting rules. We formulate our method, which we call textbfShiftDDPMs, and provide a unified point of view on existing related methods.
arXiv Detail & Related papers (2023-02-05T12:48:21Z)
Extracting or Guessing? Improving Faithfulness of Event Temporal Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives. The first perspective is to extract genuinely based on contextual description. The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z)
Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method. A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations. We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.