Related papers: Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness

URL: http://arxiv.org/abs/2501.05706v1
Date: Fri, 10 Jan 2025 04:32:19 GMT
Title: Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness
Authors: Audrey Salmon, Katie Hammer, Eddie Antonio Santos, Brett A. Becker,
Abstract summary: We show how GPT-3.5 is prompted for error explanations on just the erroneous source code itself.<n>We report the baseline results of how effective the error explanations are at providing feedback.
Score: 0.9014547127329643
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Making errors is part of the programming process -- even for the most seasoned professionals. Novices in particular are bound to make many errors while learning. It is well known that traditional (compiler/interpreter) programming error messages have been less than helpful for many novices and can have effects such as being frustrating, containing confusing jargon, and being downright misleading. Recent work has found that large language models (LLMs) can generate excellent error explanations, but that the effectiveness of these error messages heavily depends on whether the LLM has been provided with context -- typically the original source code where the problem occurred. Knowing that programming error messages can be misleading and/or contain that serves little-to-no use (particularly for novices) we explore the reverse: what happens when GPT-3.5 is prompted for error explanations on just the erroneous source code itself -- original compiler/interpreter produced error message excluded. We utilized various strategies to make more effective error explanations, including one-shot prompting and fine-tuning. We report the baseline results of how effective the error explanations are at providing feedback, as well as how various prompting strategies might improve the explanations' effectiveness. Our results can help educators by understanding how LLMs respond to such prompts that novices are bound to make, and hopefully lead to more effective use of Generative AI in the classroom.

Related papers

Understanding and Mitigating Errors of LLM-Generated RTL Code [7.747889860813149]
Large language model (LLM) based register-transfer-level (RTL) code generation is promising but the overall success rate remains unsatisfactory.<n>We conduct a comprehensive error analysis and manual categorization.<n>Our findings reveal that most errors stem from insufficient RTL programming knowledge, poor understanding of circuit concepts, or misinterpretation of complex multimodal inputs.
arXiv Detail & Related papers (2025-08-07T11:02:32Z)
Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
BugSpotter: Automated Generation of Code Debugging Exercises [22.204802715829615]
This paper introduces BugSpotter, a tool to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification.
arXiv Detail & Related papers (2024-11-21T16:56:33Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice [1.106787864231365]
We show that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks. Despite promising evidence on synthetic benchmarks, we found that GPT-4 generated error messages outperformed conventional compiler error messages in only 1 of the 6 tasks.
arXiv Detail & Related papers (2024-09-27T11:45:56Z)
Rectifier: Code Translation with Corrector via LLMs [11.38401806203093]
We propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability.
arXiv Detail & Related papers (2024-07-10T08:58:41Z)
Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message. 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z)
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems [50.76385564061713]
Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. CoT usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors, and step-missing errors. We propose Deeply Understanding the Problems (DUP) to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors.
arXiv Detail & Related papers (2024-04-23T12:16:05Z)
Learning From Mistakes Makes LLM Better Reasoner [106.48571828587728]
Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. This work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process.
arXiv Detail & Related papers (2023-10-31T17:52:22Z)
Knowledge-Augmented Language Model Verification [68.6099592486075]
Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. We propose to verify the output and the knowledge of the knowledge-augmented LMs with a separate verifier. Our results show that the proposed verifier effectively identifies retrieval and generation errors, allowing LMs to provide more factually correct outputs.
arXiv Detail & Related papers (2023-10-19T15:40:00Z)
Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning [55.76083560152823]
SelfCheck is a general-purpose zero-shot verification schema for recognizing errors in step-by-step reasoning. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
arXiv Detail & Related papers (2023-08-01T10:31:36Z)
Using Large Language Models to Enhance Programming Error Messages [5.903720638984496]
Large language models can be used to create useful enhancements to programming error messages. We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
arXiv Detail & Related papers (2022-10-20T23:17:26Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.