EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
- URL: http://arxiv.org/abs/2403.14541v2
- Date: Wed, 3 Apr 2024 16:09:22 GMT
- Title: EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling
- Authors: Shimao Zhang, Yu Bao, Shujian Huang,
- Abstract summary: We propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method to balance generation quality and diversity.
Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
- Score: 31.663507929452564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.
Related papers
- Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.
Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Conditional LoRA Parameter Generation [18.34892473337235]
We propose COND P-DIFF, a novel approach that demonstrates the feasibility of controllable high-performance parameter generation.
Experimental results in both computer vision and natural language processing domains consistently demonstrate that COND P-DIFF can generate high-performance parameters conditioned on the given task.
Our work paves the way for further exploration of condition-driven parameter generation, offering a promising direction for task-specific adaptation of neural networks.
arXiv Detail & Related papers (2024-08-02T17:43:34Z) - Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
Training [58.20089993899729]
This paper proposes TempBalance, a straightforward yet effective layerwise learning rate method.
We show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization.
We also show that TempBalance outperforms a number of state-of-the-art metrics and schedulers.
arXiv Detail & Related papers (2023-12-01T05:38:17Z) - Capturing Local Temperature Evolution during Additive Manufacturing
through Fourier Neural Operators [0.0]
This paper presents a data-driven model that captures the local temperature evolution during the additive manufacturing process.
It is tested on numerical simulations based on the Discontinuous Galerkin Finite Element Method for the Direct Energy Deposition process.
The results demonstrate that the model achieves high fidelity as measured by $R2$ and maintains generalizability to geometries that were not included in the training process.
arXiv Detail & Related papers (2023-07-04T16:17:59Z) - Not All Semantics are Created Equal: Contrastive Self-supervised
Learning with Automatic Temperature Individualization [51.41175648612714]
We propose a new robust contrastive loss inspired by distributionally robust optimization (DRO)
We show that our algorithm automatically learns a suitable $tau$ for each sample.
Our method outperforms prior strong baselines on unimodal and bimodal datasets.
arXiv Detail & Related papers (2023-05-19T19:25:56Z) - Long Horizon Temperature Scaling [90.03310732189543]
Long Horizon Temperature Scaling (LHTS) is a novel approach for sampling from temperature-scaled joint distributions.
We derive a temperature-dependent LHTS objective, and show that finetuning a model on a range of temperatures produces a single model capable of generation with a controllable long horizon temperature parameter.
arXiv Detail & Related papers (2023-02-07T18:59:32Z) - Fine-tune your Classifier: Finding Correlations With Temperature [2.071516130824992]
We analyze the impact of temperature on classification tasks by describing a dataset as a set of statistics computed on representations.
We study the correlation between these extracted statistics and the observed optimal temperatures.
arXiv Detail & Related papers (2022-10-18T09:48:46Z) - Generating Multivariate Load States Using a Conditional Variational
Autoencoder [11.557259513691239]
A conditional variational autoencoder (CVAE) neural network is proposed in this paper.
The model includes latent variation of output samples under given latent vectors and co-optimizes the parameters for this output variability.
Experiments demonstrate that the proposed generator outperforms other data generating mechanisms.
arXiv Detail & Related papers (2021-10-21T19:07:04Z) - UniPELT: A Unified Framework for Parameter-Efficient Language Model
Tuning [64.638804236566]
We propose a unified framework, UniPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup.
Remarkably, on the GLUE benchmark, UniPELT consistently achieves 13pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups.
arXiv Detail & Related papers (2021-10-14T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.