Related papers: NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging

NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging

URL: http://arxiv.org/abs/2505.15356v1
Date: Wed, 21 May 2025 10:38:50 GMT
Title: NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging
Authors: Weiming Zhang, Qingyao Li, Xinyi Dai, Jizheng Chen, Kounianhua Du, Weinan Zhang, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu,
Abstract summary: Recent advancements in large language models (LLMs) have shifted attention toward leveraging natural language reasoning to enhance code-related tasks.<n>In this paper, we introduce NL-GING, a novel framework that employs natural language as an intermediate representation to improve code.
Score: 68.42255321759062
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Debugging is a critical aspect of LLM's coding ability. Early debugging efforts primarily focused on code-level analysis, which often falls short when addressing complex programming errors that require a deeper understanding of algorithmic logic. Recent advancements in large language models (LLMs) have shifted attention toward leveraging natural language reasoning to enhance code-related tasks. However, two fundamental questions remain unanswered: What type of natural language format is most effective for debugging tasks? And what specific benefits does natural language reasoning bring to the debugging process? In this paper, we introduce NL-DEBUGGING, a novel framework that employs natural language as an intermediate representation to improve code debugging. By debugging at a natural language level, we demonstrate that NL-DEBUGGING outperforms traditional debugging methods and enables a broader modification space through direct refinement guided by execution feedback. Our findings highlight the potential of natural language reasoning to advance automated code debugging and address complex programming challenges.

Related papers

MdEval: Massively Multilingual Code Debugging [37.48700033342978]
We propose the first massively multilingual debug benchmark, which includes 3.6K test samples of 18 programming languages.<n>We introduce the instruction corpora MDEVAL-INSTRUCT by injecting bugs into the correct multilingual queries and solutions.<n>Our experiments on MDEVAL reveal a notable performance gap between open-source models and closed-source LLMs.
arXiv Detail & Related papers (2024-11-04T17:36:40Z)
Integrating Natural Language Prompting Tasks in Introductory Programming Courses [3.907735250728617]
This report explores the inclusion of two prompt-focused activities in an introductory programming course. The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax. The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code.
arXiv Detail & Related papers (2024-10-04T01:03:25Z)
A Proposal for a Debugging Learning Support Environment for Undergraduate Students Majoring in Computer Science [0.0]
Students do not know how to use a debugger or have never used one. We implemented a function in Scratch that allows for self-learning of correct breakpoint placement.
arXiv Detail & Related papers (2024-07-25T03:34:19Z)
NoviCode: Generating Programs from Natural Language Utterances by Novices [59.71218039095155]
We present NoviCode, a novel NL Programming task which takes as input an API and a natural language description by a novice non-programmer. We show that NoviCode is indeed a challenging task in the code synthesis domain, and that generating complex code from non-technical instructions goes beyond the current Text-to-Code paradigm.
arXiv Detail & Related papers (2024-07-15T11:26:03Z)
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages [21.18996339478024]
We introduce emphsynthetic programming elicitation and compilation (SPEAC) SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language.
arXiv Detail & Related papers (2024-06-05T22:16:19Z)
AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents [38.580779075892636]
We develop a novel system for Code Representation and Execution (CoRE) The proposed system unifies natural language programming, pseudo-code programming, and flow programming under the same representation for constructing language agents. During the execution, we incorporate external memory to minimize redundancy.
arXiv Detail & Related papers (2024-05-11T04:29:03Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to better interpret the programming domain knowledge.<n>CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
NExT: Teaching Large Language Models to Reason about Code Execution [50.93581376646064]
Large language models (LLMs) of code are typically trained on the surface textual form of programs. We propose NExT, a method to teach LLMs to inspect the execution traces of programs and reason about their run-time behavior.
arXiv Detail & Related papers (2024-04-23T01:46:32Z)
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. Existing approaches serialize the output graph as a flat list of nodes and edges. We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.