EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
- URL: http://arxiv.org/abs/2403.14541v2
- Date: Wed, 3 Apr 2024 16:09:22 GMT
- Title: EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
- Authors: Shimao Zhang, Yu Bao, Shujian Huang,
- Abstract summary: We propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method to balance generation quality and diversity.
Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
- Score: 31.663507929452564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
Related papers
- Critical Phase Transition in a Large Language Model [0.0]
We numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities.
Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature.
arXiv Detail & Related papers (2024-06-08T03:37:05Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - One For All & All For One: Bypassing Hyperparameter Tuning with Model
Averaging For Cross-Lingual Transfer [61.455775535559276]
We propose an unsupervised evaluation protocol for ZS-XLT.
We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER)
We find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance.
arXiv Detail & Related papers (2023-10-16T15:50:34Z) - Capturing Local Temperature Evolution during Additive Manufacturing
through Fourier Neural Operators [0.0]
This paper presents a data-driven model that captures the local temperature evolution during the additive manufacturing process.
It is tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process.
The results demonstrate that the model achieves high fidelity as measured by $R2$ and maintains generalizability to geometries that were not included in the training process.
arXiv Detail & Related papers (2023-07-04T16:17:59Z) - Not All Semantics are Created Equal: Contrastive Self-supervised
Learning with Automatic Temperature Individualization [51.41175648612714]
We propose a new robust contrastive loss inspired by distributionally robust optimization (DRO)
We show that our algorithm automatically learns a suitable $tau$ for each sample.
Our method outperforms prior strong baselines on unimodal and bimodal datasets.
arXiv Detail & Related papers (2023-05-19T19:25:56Z) - Long Horizon Temperature Scaling [90.03310732189543]
Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions.
We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
arXiv Detail & Related papers (2023-02-07T18:59:32Z) - Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations.
We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z) - Generating Multivariate Load States Using a Conditional Variational
Autoencoder [11.557259513691239]
A conditional variational autoencoder (CVAE) neural network is proposed in this paper.
The model includes latent variation of output samples under given latent vectors and co-optimizes the parameters for this output variability.
Experiments demonstrate that the proposed generator outperforms other data generating mechanisms.
arXiv Detail & Related papers (2021-10-21T19:07:04Z) - UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning [64.638804236566]
We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup.
Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
arXiv Detail & Related papers (2021-10-14T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.