KL-Divergence Guided Temperature Sampling
- URL: http://arxiv.org/abs/2306.01286v2
- Date: Wed, 29 Nov 2023 23:57:03 GMT
- Title: KL-Divergence Guided Temperature Sampling
- Authors: Chung-Ching Chang, David Reitter, Renat Aksitov, Yun-Hsuan Sung
- Abstract summary: As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations.
One common approach to mitigate hallucinations is to provide source/grounding documents.
We propose to relax the constraint of having a fixed temperature over decoding steps, and a mechanism to guide the dynamic temperature according to its relevance to the source.
- Score: 5.726259957909055
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Temperature sampling is a conventional approach to diversify large language
model predictions. As temperature increases, the prediction becomes diverse but
also vulnerable to hallucinations -- generating tokens that are sensible but
not factual. One common approach to mitigate hallucinations is to provide
source/grounding documents and the model is trained to produce predictions that
bind to and are attributable to the provided source. It appears that there is a
trade-off between diversity and attribution. To mitigate any such trade-off, we
propose to relax the constraint of having a fixed temperature over decoding
steps, and a mechanism to guide the dynamic temperature according to its
relevance to the source through KL-divergence. Our experiments justifies the
trade-off, and shows that our sampling algorithm outperforms the conventional
top-k and top-p algorithms in conversational question-answering and
summarization tasks.
Related papers
- Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Temperature Optimization for Bayesian Deep Learning [9.610060788662972]
We propose a data-driven approach to select the temperature that maximizes test log-predictive density.
We empirically demonstrate that our method performs comparably to grid search, at a fraction of the cost.
arXiv Detail & Related papers (2024-10-08T07:32:22Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Bayesian Conditional Diffusion Models for Versatile Spatiotemporal
Turbulence Generation [13.278744447861289]
We introduce a novel generative framework grounded in probabilistic diffusion models for turbulence generation.
A notable feature of our approach is the proposed method for long-span flow sequence generation, which is based on autoregressive-based conditional sampling.
We showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments.
arXiv Detail & Related papers (2023-11-14T04:08:14Z) - Dynamically Scaled Temperature in Self-Supervised Contrastive Learning [11.133502139934437]
We focus on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function.
Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
arXiv Detail & Related papers (2023-08-02T13:31:41Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - ShiftDDPMs: Exploring Conditional Diffusion Models by Shifting Diffusion
Trajectories [144.03939123870416]
We propose a novel conditional diffusion model by introducing conditions into the forward process.
We use extra latent space to allocate an exclusive diffusion trajectory for each condition based on some shifting rules.
We formulate our method, which we call textbfShiftDDPMs, and provide a unified point of view on existing related methods.
arXiv Detail & Related papers (2023-02-05T12:48:21Z) - Extracting or Guessing? Improving Faithfulness of Event Temporal
Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives.
The first perspective is to extract genuinely based on contextual description.
The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z) - Uncertainty Quantification for Traffic Forecasting: A Unified Approach [21.556559649467328]
Uncertainty is an essential consideration for time series forecasting tasks.
In this work, we focus on quantifying the uncertainty of traffic forecasting.
We develop Deep S-Temporal Uncertainty Quantification (STUQ), which can estimate both aleatoric and relational uncertainty.
arXiv Detail & Related papers (2022-08-11T15:21:53Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.