Correctness-Guaranteed Code Generation via Constrained Decoding
- URL: http://arxiv.org/abs/2508.15866v1
- Date: Wed, 20 Aug 2025 20:48:18 GMT
- Title: Correctness-Guaranteed Code Generation via Constrained Decoding
- Authors: Lingxiao Li, Salar Rahili, Yiwei Zhao,
- Abstract summary: We present a constrained runtime decoding algorithm for generating semantically correct programs.<n>We show that our method can generate semantically correct programs conforming to any prescribed scripting API.<n>We further show that, with careful design, our semantic guarantees extend to correctness, as validated in the application of generating game mechanics for a roguelike video game.
- Score: 11.531496728670746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language Models (LMs) are increasingly being used for code generation, but ensuring the correctness of generated programs remains a significant challenge. Although imperfect code may be acceptable during software development with human oversight, domains such as video games and robotics require one-shot correctness for runtime-critical components. We present a constrained decoding algorithm for generating semantically correct programs that incorporates a context-sensitive parser, which, at each step, outputs a regular expression that satisfies a critical non-extensible property to guide the generation of the next token sequence that can continue to a correct program. To build such a context-sensitive parser, we propose a framework of a dynamic tree of parsers (ToP) during parsing, where each parser corresponds to a modular context-free grammar enriched with contextual information such as variable scopes and type constraints, with tree branches representing ambiguity in the future code segment. We demonstrate our approach through sLua, a strongly typed variant of Lua, showing that our method can generate semantically correct programs conforming to any prescribed scripting API. We further show that, with careful design, our semantic guarantees extend to runtime correctness, as validated in the application of generating game mechanics for a roguelike video game.
Related papers
- AnCoder: Anchored Code Generation via Discrete Diffusion Models [36.226700922319075]
Diffusion language models offer a compelling alternative to autoregressive code generation.<n>We introduce AnchorTree, a framework that anchors the diffusion process using structured, hierarchical priors native to code.<n>We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.
arXiv Detail & Related papers (2026-02-05T22:46:43Z) - TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation [2.2940141855172036]
TreeCoder represents decoding as a tree search over candidate programs.<n>TreeCoder consistently improves accuracy across open-source models such as CodeLlama, Mistral and DeepSeek.
arXiv Detail & Related papers (2025-11-27T09:59:39Z) - Type-Constrained Code Generation with Language Models [51.03439021895432]
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.<n>For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code.<n>Our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Semantic Parsing with Candidate Expressions for Knowledge Base Question Answering [4.795837146925278]
Semantics convert natural language to logical forms, which can be evaluated on knowledge bases (KBs) to produce denotations.<n>Recent semantics have been developed with sequence-to-sequence (seq2seq) pre-trained language models (PLMs)<n>We propose a grammar augmented with candidate expressions for semantic parsing on a large KB with a seq2seq PLM.
arXiv Detail & Related papers (2024-10-01T05:46:22Z) - Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation.
However, their generation speed is limited by the inherently sequential nature of their decoding process.
This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z) - CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to better interpret the programming domain knowledge.<n>CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Constrained Decoding for Fill-in-the-Middle Code Language Models via Efficient Left and Right Quotienting of Context-Sensitive Grammars [11.279507894576213]
This paper contributes an incremental synthesis that allows early rejection of syntactically incorrect code.
We extend the Earley parsing algorithm to allow for left and right quotients of context-free grammars.
arXiv Detail & Related papers (2024-02-28T02:12:47Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - Transformer with Tree-order Encoding for Neural Program Generation [8.173517923612426]
We introduce a tree-based positional encoding and a shared natural-language subword vocabulary for Transformers.
Our findings suggest that employing a tree-based positional encoding in combination with a shared natural-language subword vocabulary improves generation performance over sequential positional encodings.
arXiv Detail & Related papers (2022-05-30T12:27:48Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.