Related papers: Strengthening Programming Comprehension in Large Language Models through Code Generation

Strengthening Programming Comprehension in Large Language Models through Code Generation

URL: http://arxiv.org/abs/2508.12620v1
Date: Mon, 18 Aug 2025 04:33:03 GMT
Title: Strengthening Programming Comprehension in Large Language Models through Code Generation
Authors: Xiaoning Ren, Qiang Hu, Wei Ma, Yan Li, Yao Zhang, Lingxiao Jiang, Yinxing Xue,
Abstract summary: Large language models (LLMs) have recently shown impressive results on diverse code-related tasks.<n>Their grasp of fundamental programming concepts, such as data flow and control flow, remains shallow, leading to fragile performance when code requires deeper reasoning.<n>This work introduces a counterfactual code augmentation framework combined with concept-aware tuning, designed to guide LLMs toward stronger conceptual understanding.
Score: 23.72685095718304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have recently shown impressive results on diverse code-related tasks, benefiting from large-scale training and instruction tuning. However, studies reveal that their grasp of fundamental programming concepts, such as data flow and control flow, remains shallow, leading to fragile performance when code requires deeper reasoning. This limitation restricts the practical adoption of LLMs in real-world software development. To address this issue, this work introduces a counterfactual code augmentation framework combined with concept-aware tuning, designed to guide LLMs toward stronger conceptual understanding. Comprehensive evaluation across multiple models and benchmarks demonstrates the effectiveness of the proposed approach.

Related papers

CodeSimpleQA: Scaling Factuality in Code Large Language Models [55.705748501461294]
We present CodeSimpleQA, a comprehensive benchmark designed to evaluate the factual accuracy of code LLMs in answering code-related questions.<n>We also create CodeSimpleQA-Instruct, a large-scale instruction corpus with 66M samples, and develop a post-training framework combining supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-12-22T14:27:17Z)
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z)
Language Ranker: A Lightweight Ranking framework for LLM Decoding [70.01564145836129]
This paper conceptualizes the decoding process as analogous to the ranking stage in recommendation pipelines.<n>Motivated by this insight, we propose Language Ranker, a novel framework that introduces a lightweight module to rerank candidate responses.<n> Experiments show that Language Ranker achieves performance comparable to large-scale reward models, while requiring only 0.5M additional parameters.
arXiv Detail & Related papers (2025-10-23T17:56:46Z)
Discrete Tokenization for Multimodal LLMs: A Comprehensive Survey [69.45421620616486]
This work presents the first structured taxonomy and analysis of discrete tokenization methods designed for large language models (LLMs)<n>We categorize 8 representative VQ variants that span classical and modern paradigms and analyze their algorithmic principles, training dynamics, and integration challenges with LLM pipelines.<n>We identify key challenges including codebook collapse, unstable gradient estimation, and modality-specific encoding constraints.
arXiv Detail & Related papers (2025-07-21T10:52:14Z)
Pragmatic Reasoning improves LLM Code Generation [35.78260347663757]
We propose CodeRSA, a novel code candidate reranking mechanism built upon the Rational Speech Act (RSA) framework.<n>We evaluate CodeRSA using one of the latest Large Language Models on a popular code generation dataset.
arXiv Detail & Related papers (2025-02-20T12:44:26Z)
Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks [1.9198713957364215]
Large Language Models (LLMs) have shown remarkable capabilities in code generation tasks.<n>They face significant limitations in handling complex, long-context programming challenges.<n>This paper introduces a novel agentic framework for guided code generation''
arXiv Detail & Related papers (2025-01-11T19:21:53Z)
Analysis on LLMs Performance for Code Summarization [0.0]
Large Language Models (LLMs) have significantly advanced the field of code summarization.<n>This study aims to perform a comparative analysis of several open-source LLMs, namely LLaMA-3, Phi-3, Mistral, and Gemma.
arXiv Detail & Related papers (2024-12-22T17:09:34Z)
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding [1.3479499607624648]
Speculative decoding addresses bottleneck by introducing a two-stage framework: drafting and verification.<n>A smaller, efficient model generates a preliminary draft, which is then refined by a larger, more sophisticated model.<n>This paper provides a comprehensive survey of speculative decoding methods, categorizing them into draft-centric and model-centric approaches.
arXiv Detail & Related papers (2024-11-20T09:46:30Z)
VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning [10.70881967278009]
We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought snippets (CoT) reasoning with a visual Control Flow Graph (CFG)<n>We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.
arXiv Detail & Related papers (2024-10-30T19:07:01Z)
A Thorough Examination of Decoding Methods in the Era of LLMs [72.65956436513241]
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of large language models. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization.
arXiv Detail & Related papers (2024-02-10T11:14:53Z)
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding [46.485363806259265]
Speculative Decoding has emerged as a novel decoding paradigm for Large Language Models (LLMs) inference. In each decoding step, this method first drafts several future tokens efficiently and then verifies them in parallel. This paper presents a comprehensive overview and analysis of this promising decoding paradigm.
arXiv Detail & Related papers (2024-01-15T17:26:50Z)
A Principled Framework for Knowledge-enhanced Large Language Model [58.1536118111993]
Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning. This paper introduces a rigorously designed framework for creating LLMs that effectively anchor knowledge and employ a closed-loop reasoning process.
arXiv Detail & Related papers (2023-11-18T18:10:02Z)
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model. We propose a novel causal graph-based representation of the prompt and the generated code. We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.