Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models
- URL: http://arxiv.org/abs/2309.02772v3
- Date: Thu, 28 Dec 2023 10:54:36 GMT
- Title: Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models
- Authors: Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
- Abstract summary: We conduct the first systematic study to explore a decoding strategy specialized in code generation.
Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling.
Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
- Score: 54.72004797421481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Large Language Models (LLMs) have shown impressive abilities in
code generation. However, existing LLMs' decoding strategies are designed for
Natural Language (NL) generation, overlooking the differences between NL and
programming languages (PL). Due to this oversight, a better decoding strategy
for code generation remains an open question. In this paper, we conduct the
first systematic study to explore a decoding strategy specialized in code
generation. With an analysis of loss distributions of code tokens, we find that
code tokens can be divided into two categories: challenging tokens that are
difficult to predict and confident tokens that can be easily inferred. Among
them, the challenging tokens mainly appear at the beginning of a code block.
Inspired by the above findings, we propose a simple yet effective method:
Adaptive Temperature (AdapT) sampling, which dynamically adjusts the
temperature coefficient when decoding different tokens. We apply a larger
temperature when sampling for challenging tokens, allowing LLMs to explore
diverse choices. We employ a smaller temperature for confident tokens avoiding
the influence of tail randomness noises. We apply AdapT sampling to LLMs with
different sizes and conduct evaluations on two popular datasets. Results show
that AdapT sampling significantly outperforms state-of-the-art decoding
strategy.
Related papers
- Min P Sampling: Balancing Creativity and Coherence at High Temperature [2.6639520483183867]
min-$p$ is a dynamic truncation sampling method that scales according to the probability of the top candidate token.
We demonstrate that min-$p$ improves the coherence and quality of generated text even at high temperatures.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model [10.682263930467196]
The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs)
Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they employ a router to predict the routing for each token.
This paper proposes a novel method based on token-level gradient analysis.
arXiv Detail & Related papers (2024-06-28T13:20:17Z) - T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation [35.10931307279044]
This paper proposes Self-Evaluation Decoding, SED, a decoding method for enhancing model generation.
It integrates speculation and evaluation steps into the decoding process, allowing LLMs to make more careful decisions.
arXiv Detail & Related papers (2024-05-26T12:43:18Z) - Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models [6.646510073473929]
We propose SlimCode, a model-agnostic code simplification solution for Large Language Models.
SlimCode can improve the state-of-the-art technique by 9.46% and 5.15% in terms of MRR and BLEU score on code search and summarization.
arXiv Detail & Related papers (2024-05-18T06:15:52Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Testing LLMs on Code Generation with Varying Levels of Prompt
Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing.
The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.