Related papers: Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation

URL: http://arxiv.org/abs/2602.01187v1
Date: Sun, 01 Feb 2026 12:22:46 GMT
Title: Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation
Authors: Chengran Yang, Zichao Wei, Heminghao Deng, Jinfeng Jiang, Zhensu Sun, Ting Zhang, Tianyi Wu, Ming Wen, David Lo,
Abstract summary: Stream of Revision is a paradigm shift that elevates code generation from a monotonic stream to a dynamic, self-correcting trajectory.<n>We introduce specific action tokens that enable the model to seamlessly backtrack and edit its own history within a single forward pass.
Score: 17.125957722393327
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model (LLM) based code generation is predominantly formulated as a strictly monotonic process, appending tokens linearly to an immutable prefix. This formulation contrasts to the cognitive process of programming, which is inherently interleaved with forward generation and on-the-fly revision. While prior works attempt to introduce revision via post-hoc agents or external static tools, they either suffer from high latency or fail to leverage the model's intrinsic semantic reasoning. In this paper, we propose Stream of Revision, a paradigm shift that elevates code generation from a monotonic stream to a dynamic, self-correcting trajectory by leveraging model's intrinsic capabilities. We introduce specific action tokens that enable the model to seamlessly backtrack and edit its own history within a single forward pass. By internalizing the revision loop, our framework Stream of Revision allows the model to activate its latent capabilities just-in-time without external dependencies. Empirical results on secure code generation show that Stream of Revision significantly reduces vulnerabilities with minimal inference overhead.

Related papers

CORE: Context-Robust Remasking for Diffusion Language Models [51.59514489363897]
We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
arXiv Detail & Related papers (2026-02-04T00:12:30Z)
Training-Free Self-Correction for Multimodal Masked Diffusion Models [61.84305395626145]
We propose a training-free self-correction framework that exploits the inductive biases of pre-trained masked diffusion models.<n>Our method significantly improves generation quality on text-to-image generation and multimodal understanding tasks with reduced sampling steps.
arXiv Detail & Related papers (2026-02-02T23:58:15Z)
Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models [1.3535770763481902]
Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies.<n>We propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning.<n>DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder.
arXiv Detail & Related papers (2026-01-29T15:25:37Z)
PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations [52.67948063133533]
Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs.<n>Existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces.<n>We propose Promise, a novel framework that integrates dense, step-by-step verification into generative models.
arXiv Detail & Related papers (2026-01-08T07:38:46Z)
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model [72.73512218682187]
We introduce ReDiff, a refining-enhanced diffusion framework that teaches the model to identify and correct its own errors.<n>Our approach features a two-stage training process: first, we instill a foundational revision capability by training the model to revise synthetic errors; second, we implement a novel online self-correction loop.<n>This mistake-driven learning endows the model with the crucial ability to revisit and refine its already generated output, effectively breaking the error cascade.
arXiv Detail & Related papers (2025-10-22T06:58:55Z)
Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset [0.0]
We present ReDef, a high-confidence benchmark of function-level modifications curated from 22 large-scale C/C++ projects.<n>Defective cases are anchored by revert commits, while clean cases are validated through post-hoc history checks.<n>This pipeline yields 3,164 defective and 10,268 clean modifications, offering substantially more reliable labels than prior existing resources.
arXiv Detail & Related papers (2025-09-11T07:07:11Z)
$φ^{\infty}$: Clause Purification, Embedding Realignment, and the Total Suppression of the Em Dash in Autoregressive Language Models [0.0]
We identify a critical vulnerability in autoregressive transformer language models where the em dash token induces semantic drift.<n>We propose a novel solution combining symbolic clause purification via the phi-infinity operator with targeted embedding matrix.
arXiv Detail & Related papers (2025-06-22T18:27:39Z)
Interpret the Internal States of Recommendation Model with Sparse Autoencoder [28.234859617081295]
RecSAE is an automated and generalizable probing framework that interprets Recommenders with Sparse AutoEncoder.<n>It extracts interpretable latents from the internal states of recommendation models and links them to semantic concepts for interpretation.<n> RecSAE does not alter original models during interpretation and also enables targeted de-biasing to models based on interpreted results.
arXiv Detail & Related papers (2024-11-09T08:22:31Z)
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement [80.18490952057125]
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. We propose Context-Wise Order-Agnostic Language Modeling (COrAL) to overcome these challenges. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally.
arXiv Detail & Related papers (2024-10-12T23:56:19Z)
Self-Infilling Code Generation [60.12883980846781]
We introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding. Our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
arXiv Detail & Related papers (2023-11-29T16:02:06Z)
DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models. It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer. It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z)
Preventing Posterior Collapse with Levenshtein Variational Autoencoder [61.30283661804425]
We propose to replace the evidence lower bound (ELBO) with a new objective which is simple to optimize and prevents posterior collapse. We show that Levenstein VAE produces more informative latent representations than alternative approaches to preventing posterior collapse.
arXiv Detail & Related papers (2020-04-30T13:27:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.