CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs
- URL: http://arxiv.org/abs/2602.07080v1
- Date: Fri, 06 Feb 2026 03:49:15 GMT
- Title: CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs
- Authors: Yicheng He, Zheng Zhao, Zhou Kaiyu, Bryan Dai, Jie Fu, Yonghui Yang,
- Abstract summary: We aim to investigate whether the model's neural dynamics encode internally decodable signals that are predictive of logical validity during code generation.<n>By decomposing complex residual flows, we aim to identify the structural signatures that distinguish sound reasoning from logical failure.<n>Analysis across Python, C++, and Java confirms that intrinsic correctness signals are robust across diverse syntaxes.
- Score: 13.488544043942495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current paradigms for code verification rely heavily on external mechanisms-such as execution-based unit tests or auxiliary LLM judges-which are often labor-intensive or limited by the judging model's own capabilities. This raises a fundamental, yet unexplored question: Can an LLM's functional correctness be assessed purely from its internal computational structure? Our primary objective is to investigate whether the model's neural dynamics encode internally decodable signals that are predictive of logical validity during code generation. Inspired by mechanistic interpretability, we propose to treat code verification as a mechanistic diagnostic task, mapping the model's explicit algorithmic trajectory into line-level attribution graphs. By decomposing complex residual flows, we aim to identify the structural signatures that distinguish sound reasoning from logical failure within the model's internal circuits. Analysis across Python, C++, and Java confirms that intrinsic correctness signals are robust across diverse syntaxes. Topological features from these internal graphs predict correctness more reliably than surface heuristics and enable targeted causal interventions to fix erroneous logic. These findings establish internal introspection as a decodable property for verifying generated code. Our code is at https:// github.com/bruno686/CodeCircuit.
Related papers
- A Causal Perspective on Measuring, Explaining and Mitigating Smells in LLM-Generated Code [49.09545217453401]
Propensity Smelly Score (PSC) is a metric that estimates the likelihood of generating particular smell types.<n>We identify how generation strategy, model size, model architecture and prompt formulation shape the structural properties of generated code.<n> PSC helps developers interpret model behavior and assess code quality, providing evidence that smell propensity signals can support human judgement.
arXiv Detail & Related papers (2025-11-19T19:18:28Z) - QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation [47.82802346420197]
We propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV)<n>We verify the functional correctness of signals in generated module by comparing with that of reference module in the training data.<n>Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments.
arXiv Detail & Related papers (2025-10-22T06:58:07Z) - Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z) - Mechanistic Interpretability of Code Correctness in LLMs via Sparse Autoencoders [0.0]
We apply sparse autoencoders to decompose Large Language Models, identifying directions that correspond to code correctness.<n>We find that code correctness directions in LLMs reliably predict incorrect code, while correction capabilities, though statistically significant, involve tradeoffs between fixing errors and preserving correct code.<n>Our mechanistic insights suggest three practical applications: prompting strategies should prioritize test examples over elaborate problem descriptions, predictor directions can serve as error alarms for developer review, and these same predictors can guide selective steering, intervening only when errors are anticipated to prevent the code corruption from constant steering.
arXiv Detail & Related papers (2025-10-03T11:44:21Z) - Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z) - Position: Intelligent Coding Systems Should Write Programs with Justifications [9.304020701255093]
We argue that these systems should not only generate code but also produce clear, consistent justifications that bridge model reasoning and user understanding.<n>We advocate exploring neuro-symbolic approaches for justification generation, where symbolic constraints guide behavior during training and program semantics are enriched through neural representations.
arXiv Detail & Related papers (2025-08-08T05:04:47Z) - Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback [36.69082579950107]
Large language models (LLMs) have shown strong performance in Verilog generation from natural language description.<n>This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs.
arXiv Detail & Related papers (2025-04-22T11:38:14Z) - Correctness Assessment of Code Generated by Large Language Models Using Internal Representations [4.32362000083889]
We introduce OPENIA, a novel framework to assess the correctness of code generated by Large Language Models (LLMs)<n>Our empirical analysis reveals that these internal representations encode latent information, which strongly correlates with the correctness of the generated code.<n> OPENIA consistently outperforms baseline models, achieving higher accuracy, precision, recall, and F1-Scores with up to a 2X improvement in standalone code generation.
arXiv Detail & Related papers (2025-01-22T15:04:13Z) - LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
We introduce LatentQA, the task of answering open-ended questions about model activations in natural language.<n>We propose Latent Interpretation Tuning (LIT), which finetunes a decoder LLM on a dataset of activations and associated question-answer pairs.<n>Our decoder also specifies a differentiable loss that we use to control models, such as debiasing models on stereotyped sentences and controlling the sentiment of generations.
arXiv Detail & Related papers (2024-12-11T18:59:33Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.