Related papers: Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models

URL: http://arxiv.org/abs/2309.02772v3
Date: Thu, 28 Dec 2023 10:54:36 GMT
Title: Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Authors: Yuqi Zhu, Jia Li, Ge Li, YunFei Zhao, Jia Li, Zhi Jin, Hong Mei
Abstract summary: We conduct the first systematic study to explore a decoding strategy specialized in code generation. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
Score: 54.72004797421481
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.

Related papers

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
Crystal: Illuminating LLM Abilities on Language and Code [58.5467653736537]
We propose a pretraining strategy to enhance the integration of natural language and coding capabilities. The resulting model, Crystal, demonstrates remarkable capabilities in both domains.
arXiv Detail & Related papers (2024-11-06T10:28:46Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model [20.979790612689992]
Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs) Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually employ a router to predict the routing of each token. This paper proposes a novel method based on token-level gradient analysis, i.e., Solving Token Gradient Conflict (STGC)
arXiv Detail & Related papers (2024-06-28T13:20:17Z)
Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation [32.85339480783571]
We introduce a new decoding approach named Debiasing-Diversifying Decoding (D3) D3 disables length normalization for ghost tokens to alleviate amplification bias. Experiments on real-world datasets demonstrate the method's effectiveness.
arXiv Detail & Related papers (2024-06-21T06:47:28Z)
SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation [35.10931307279044]
This paper proposes Self-Evaluation Decoding, SED, a decoding method for enhancing model generation. It integrates speculation and evaluation steps into the decoding process, allowing LLMs to make more careful decisions.
arXiv Detail & Related papers (2024-05-26T12:43:18Z)
Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models [6.646510073473929]
We propose SlimCode, a model-agnostic code simplification solution for Large Language Models. SlimCode can improve the state-of-the-art technique by 9.46% and 5.15% in terms of MRR and BLEU score on code search and summarization.
arXiv Detail & Related papers (2024-05-18T06:15:52Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation. CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details. Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)
Testing LLMs on Code Generation with Varying Levels of Prompt Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining [59.169836983883656]
COCO-LM is a new self-supervised learning framework that pretrains Language Models by COrrecting challenging errors and COntrasting text sequences. COCO-LM employs an auxiliary language model to mask-and-predict tokens in original text sequences. Our analyses reveal that COCO-LM's advantages come from its challenging training signals, more contextualized token representations, and regularized sequence representations.
arXiv Detail & Related papers (2021-02-16T22:24:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.