Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models
- URL: http://arxiv.org/abs/2309.02772v3
- Date: Thu, 28 Dec 2023 10:54:36 GMT
- Title: Hot or Cold? Adaptive Temperature Sampling for Code Generation with
Large Language Models
- Authors: Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
- Abstract summary: We conduct the first systematic study to explore a decoding strategy specialized in code generation.
Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling.
Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
- Score: 54.72004797421481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Large Language Models (LLMs) have shown impressive abilities in
code generation. However, existing LLMs' decoding strategies are designed for
Natural Language (NL) generation, overlooking the differences between NL and
programming languages (PL). Due to this oversight, a better decoding strategy
for code generation remains an open question. In this paper, we conduct the
first systematic study to explore a decoding strategy specialized in code
generation. With an analysis of loss distributions of code tokens, we find that
code tokens can be divided into two categories: challenging tokens that are
difficult to predict and confident tokens that can be easily inferred. Among
them, the challenging tokens mainly appear at the beginning of a code block.
Inspired by the above findings, we propose a simple yet effective method:
Adaptive Temperature (AdapT) sampling, which dynamically adjusts the
temperature coefficient when decoding different tokens. We apply a larger
temperature when sampling for challenging tokens, allowing LLMs to explore
diverse choices. We employ a smaller temperature for confident tokens avoiding
the influence of tail randomness noises. We apply AdapT sampling to LLMs with
different sizes and conduct evaluations on two popular datasets. Results show
that AdapT sampling significantly outperforms state-of-the-art decoding
strategy.
Related papers
- Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities.
The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model [20.979790612689992]
Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs)
Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token.
This paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC)
arXiv Detail & Related papers (2024-06-28T13:20:17Z) - Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation [32.85339480783571]
We introduce a new decoding approach named Debiasing-Diversifying Decoding (D3)
D3 disables length normalization for ghost tokens to alleviate amplification bias.
Experiments on real-world datasets demonstrate the method's effectiveness.
arXiv Detail & Related papers (2024-06-21T06:47:28Z) - SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation [35.10931307279044]
This paper proposes Self-Evaluation Decoding, SED, a decoding method for enhancing model generation.
It integrates speculation and evaluation steps into the decoding process, allowing LLMs to make more careful decisions.
arXiv Detail & Related papers (2024-05-26T12:43:18Z) - Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models [6.646510073473929]
We propose SlimCode, a model-agnostic code simplification solution for Large Language Models.
SlimCode can improve the state-of-the-art technique by 9.46% and 5.15% in terms of MRR and BLEU score on code search and summarization.
arXiv Detail & Related papers (2024-05-18T06:15:52Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - COCO-LM: Correcting and Contrasting Text Sequences for Language Model
Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences.
COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences.
Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.