Related papers: Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts

Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts

URL: http://arxiv.org/abs/2404.19336v2
Date: Wed, 1 May 2024 05:14:31 GMT
Title: Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts
Authors: Yanggyu Lee, Suchae Jeong, Jihie Kim,
Abstract summary: A key aspect of programming education is understanding and dealing with error message. 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
Score: 1.7095867620640115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: LLMs trained in the understanding of programming syntax are now providing effective assistance to developers and are being used in programming education such as in generation of coding problem examples or providing code explanations. A key aspect of programming education is understanding and dealing with error message. However, 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. In this study, building on existing research on programming errors, we first define the types of logical errors that can occur in programming in general. Based on the definition, we propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts. The experimental results indicate that when such logical error descriptions in the prompt are used, the average classifition performance is about 21% higher than the ones without them. We also conducted an experiment for exploiting the relations among errors in generating a new logical error dataset using LLMs. As there is very limited dataset for logical errors such benchmark dataset can be very useful for various programming related applications. We expect that our work can assist novice programmers in identifying the causes of code errors and correct them more effectively.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness [0.9014547127329643]
We show how GPT-3.5 is prompted for error explanations on just the erroneous source code itself. We report the baseline results of how effective the error explanations are at providing feedback.
arXiv Detail & Related papers (2025-01-10T04:32:19Z)
Logic Error Localization in Student Programming Assignments Using Pseudocode and Graph Neural Networks [31.600659350609476]
We develop a system designed to localize logic errors within student programming assignments at the line level. We employ a graph neural network to both localize and suggest corrections for logic errors. Our experimental results are promising, demonstrating a localization accuracy of 99.2% for logic errors within the top-10 suspected lines.
arXiv Detail & Related papers (2024-10-11T01:46:24Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice [1.106787864231365]
We show that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks. Despite promising evidence on synthetic benchmarks, we found that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks.
arXiv Detail & Related papers (2024-09-27T11:45:56Z)
Decoding Logic Errors: A Comparative Study on Bug Detection by Students and Large Language Models [5.162225137921625]
Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks. We investigate the performance of two popular LLMs, GPT-3 and GPT-4, for detecting and providing a novice-friendly explanation of logic errors.
arXiv Detail & Related papers (2023-11-27T17:28:33Z)
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns [58.91023283103762]
Characterizing errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions. We show that our method, Premise, performs well in practice.
arXiv Detail & Related papers (2023-11-18T00:24:26Z)
Rule-Based Error Classification for Analyzing Differences in Frequent Errors [0.0]
We classify errors for 95,631 code pairs and identify 3.47 errors on average, which are submitted by various levels of programmers on an online judge system. The analyzed results show that, as for the same introductory problems, errors made by novices are due to the lack of knowledge in programming. On the other hand, errors made by experts are due to misunderstandings caused by the carelessness of reading problems or the challenges of solving problems differently than usual.
arXiv Detail & Related papers (2023-11-01T13:36:20Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
Using Large Language Models to Enhance Programming Error Messages [5.903720638984496]
Large language models can be used to create useful enhancements to programming error messages. We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
arXiv Detail & Related papers (2022-10-20T23:17:26Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.