Related papers: AnCoder: Anchored Code Generation via Discrete Diffusion Models

AnCoder: Anchored Code Generation via Discrete Diffusion Models

URL: http://arxiv.org/abs/2602.17688v1
Date: Thu, 05 Feb 2026 22:46:43 GMT
Title: AnCoder: Anchored Code Generation via Discrete Diffusion Models
Authors: Anton Xue, Litu Rout, Constantine Caramanis, Sanjay Shakkottai,
Abstract summary: Diffusion language models offer a compelling alternative to autoregressive code generation.<n>We introduce AnchorTree, a framework that anchors the diffusion process using structured, hierarchical priors native to code.<n>We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.
Score: 36.226700922319075
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of programming languages and, as a result, often produce broken programs that fail to execute. To address this, we introduce AnchorTree, a framework that explicitly anchors the diffusion process using structured, hierarchical priors native to code. Specifically, AnchorTree uses the abstract syntax tree to prioritize resolving syntactically and semantically salient tokens, such as keywords (e.g., if, while) and identifiers (e.g., variable names), thereby establishing a structural scaffold that guides the remaining generation. We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.

Related papers

Modular Layout Synthesis (MLS): Front-end Code via Structure Normalization and Constrained Generation [18.154715745625328]
Automated front-end engineering drastically reduces development cycles and minimizes manual coding overhead.<n>Current solutions often produce monolithic scripts, failing to support modern ecosystems like React, Vue, or Angular.<n>We introduce Modular Layout Synthesis (MLS), a hierarchical framework that merges visual understanding with structural normalization.<n>MLS significantly outperforms existing baselines, ensuring superior code reusability and structural integrity across multiple frameworks.
arXiv Detail & Related papers (2025-12-22T03:24:11Z)
Correctness-Guaranteed Code Generation via Constrained Decoding [11.531496728670746]
We present a constrained runtime decoding algorithm for generating semantically correct programs.<n>We show that our method can generate semantically correct programs conforming to any prescribed scripting API.<n>We further show that, with careful design, our semantic guarantees extend to correctness, as validated in the application of generating game mechanics for a roguelike video game.
arXiv Detail & Related papers (2025-08-20T20:48:18Z)
TreeDiff: AST-Guided Code Generation with Diffusion LLMs [27.111814602726227]
We propose a syntax-aware diffusion framework that incorporates structural priors from Abstract Syntax Trees (ASTs) into the denoising process.<n>Results demonstrate that syntax-aware corruption significantly improves syntactic correctness, reconstruction accuracy, and generalization to unseen code patterns.
arXiv Detail & Related papers (2025-08-02T19:46:09Z)
Efficient Guided Generation for Large Language Models [0.21485350418225244]
We show how the problem of neural text generation can be constructively reformulated in terms of transitions between the states of a finite-state machine. This framework leads to an efficient approach to guiding text generation with regular expressions and context-free grammars.
arXiv Detail & Related papers (2023-07-19T01:14:49Z)
Structured Dialogue Discourse Parsing [79.37200787463917]
discourse parsing aims to uncover the internal structure of a multi-participant conversation. We propose a principled method that improves upon previous work from two perspectives: encoding and decoding. Experiments show that our method achieves new state-of-the-art, surpassing the previous model by 2.3 on STAC and 1.5 on Molweni.
arXiv Detail & Related papers (2023-06-26T22:51:01Z)
Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively. A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z)
Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures. We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees. Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z)
Contrastive Learning for Source Code with Structural and Functional Properties [66.10710134948478]
We present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code. We employ automated, structure-guided code transformation algorithms that generate functionally equivalent code that looks drastically different from the original one. We train our model in a way that brings the functionally equivalent code closer and distinct code further through a contrastive learning objective.
arXiv Detail & Related papers (2021-10-08T02:56:43Z)
GraphCodeBERT: Pre-training Code Representations with Data Flow [97.00641522327699]
We present GraphCodeBERT, a pre-trained model for programming language that considers the inherent structure of code. We use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables. We evaluate our model on four tasks, including code search, clone detection, code translation, and code refinement.
arXiv Detail & Related papers (2020-09-17T15:25:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.