Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance
- URL: http://arxiv.org/abs/2506.08980v2
- Date: Wed, 11 Jun 2025 04:29:51 GMT
- Title: Towards Better Code Generation: Adaptive Decoding with Uncertainty Guidance
- Authors: Kaifeng He, Mingwei Liu, Chong Wang, Zike Li, Yanlin Wang, Xin Peng, Zibin Zheng,
- Abstract summary: We introduce AdaDec, an adaptive decoding framework guided by token-level uncertainty quantified via Shannon entropy.<n>AdaDec achieves up to a 15.5% improvement in Pass@1 accuracy compared to greedy decoding, matches or outperforms traditional beam search.
- Score: 28.99265405319943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code generation using large language models (LLMs) is highly sensitive to the choice of tokens during decoding, especially at points of uncertainty that critically affect the generated program's logic. Conventional decoding methods such as greedy search and beam search apply uniform treatment to all tokens, neglecting the unique uncertainty characteristics inherent in code generation, which can result in suboptimal outputs. In this work, we conduct an empirical analysis demonstrating that a significant portion of generation errors arises from incorrect token ranking at high-uncertainty steps, where the ground truth token exists in the candidate set but fails to be ranked first. Inspired by this insight, we introduce AdaDec, an adaptive decoding framework guided by token-level uncertainty quantified via Shannon entropy. AdaDec dynamically learns uncertainty thresholds tailored to each model and employs a pause-then-rerank mechanism with lookahead when the uncertainty surpasses these thresholds. Evaluation on the HumanEval and MBPP benchmarks reveals that AdaDec achieves up to a 15.5% improvement in Pass@1 accuracy compared to greedy decoding, matches or outperforms traditional beam search, and reduces both computational overhead and latency through targeted, selective pausing. Our findings suggest that uncertainty-aware adaptive decoding holds considerable potential for enhancing both the reliability and efficiency of code generation with LLMs.
Related papers
- Confidence Optimization for Probabilistic Encoding [0.9999629695552196]
We introduce a confidence-aware mechanism to adjust distance calculations.<n>We replace the conventional KL divergence-based variance regularization with a simpler L2 regularization term to directly constrain variance.<n>Our method significantly improves performance and generalization on both the BERT and the RoBERTa model.
arXiv Detail & Related papers (2025-07-22T15:32:27Z) - A Mixture of Linear Corrections Generates Secure Code [20.94236753015922]
Large language models (LLMs) have become proficient at sophisticated code-generation tasks, yet remain ineffective at reliably detecting or avoiding code vulnerabilities.<n>We find that current LLMs encode precise internal representations that distinguish vulnerable from secure code.<n>We develop an inference-time steering technique that subtly modulates the model's token-generation probabilities through a mixture of corrections.
arXiv Detail & Related papers (2025-07-13T06:27:33Z) - COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z) - Token-Level Uncertainty Estimation for Large Language Model Reasoning [24.56760223952017]
Large Language Models (LLMs) have demonstrated impressive capabilities, but their output quality remains inconsistent across various application scenarios.<n>We propose a token-level uncertainty estimation framework to enable LLMs to self-assess and self-improve their generation quality in mathematical reasoning.
arXiv Detail & Related papers (2025-05-16T22:47:32Z) - Uncertainty-Guided Chain-of-Thought for Code Generation with LLMs [45.33160999781074]
Chain-of-Thought (CoT) reasoning has been demonstrated as an effective technique for improving the problem-solving capabilities of large language models (LLMs)<n>We introduce UnCert-CoT, an approach designed to enhance code generation by incorporating an uncertainty-aware CoT reasoning mechanism.
arXiv Detail & Related papers (2025-03-19T15:40:45Z) - Uncertainty-Aware Decoding with Minimum Bayes Risk [70.6645260214115]
We show how Minimum Bayes Risk decoding, which selects model generations according to an expected risk, can be generalized into a principled uncertainty-aware decoding method.<n>We show that this modified expected risk is useful for both choosing outputs and deciding when to abstain from generation and can provide improvements without incurring overhead.
arXiv Detail & Related papers (2025-03-07T10:55:12Z) - Assessing Correctness in LLM-Based Code Generation via Uncertainty Estimation [0.0]
We explore uncertainty estimation as a proxy for correctness in LLM-generated code.<n>We adapt two state-of-the-art techniques from natural language generation to the domain of code generation.<n>Our findings indicate a strong correlation between the uncertainty computed through these techniques and correctness.
arXiv Detail & Related papers (2025-02-17T10:03:01Z) - Focused-DPO: Enhancing Code Generation Through Focused Preference Optimization on Error-Prone Points [51.40935517552926]
We introduce Focused-DPO, a framework that enhances code generation by directing preference optimization towards critical error-prone areas.<n>By focusing on error-prone points, Focused-DPO advances the accuracy and functionality of model-generated code.
arXiv Detail & Related papers (2025-02-17T06:16:02Z) - Benchmarking Large Language Model Uncertainty for Prompt Optimization [4.151658495779136]
This paper introduces a benchmark dataset to evaluate uncertainty metrics.<n>We show that current metrics align more with Answer Uncertainty, which reflects output confidence and diversity, rather than Correctness Uncertainty.
arXiv Detail & Related papers (2024-09-16T07:13:30Z) - Learning a Factorized Orthogonal Latent Space using Encoder-only Architecture for Fault Detection; An Alarm management perspective [0.2455468619225742]
This paper introduces a novel encoder-based residual design that effectively decouples erroneously identified and deterministic components of process variables.
The proposed model employs two distinct encoders to factorize the latent space into two spaces: one for the deterministic part and the other for the part.
The proposed model significantly enhances prediction quality while achieving nearly zero false alarms and missed detections.
arXiv Detail & Related papers (2024-08-24T09:00:45Z) - Uncertainty Awareness of Large Language Models Under Code Distribution
Shifts: A Benchmark Study [14.507068647009602]
Large Language Models (LLMs) have been widely employed in programming language analysis to enhance human productivity.
Their reliability can be compromised by various code distribution shifts, leading to inconsistent outputs.
Probability methods are known to mitigate such impact through uncertainty calibration and estimation.
arXiv Detail & Related papers (2024-01-12T00:00:32Z) - Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks.
We instruct an LLM to self-evaluate its answers.
We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z) - Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Deep Momentum Uncertainty Hashing [65.27971340060687]
We propose a novel Deep Momentum Uncertainty Hashing (DMUH)
It explicitly estimates the uncertainty during training and leverages the uncertainty information to guide the approximation process.
Our method achieves the best performance on all of the datasets and surpasses existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-09-17T01:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.