Related papers: Using Large Language Models to Enhance Programming Error Messages

Using Large Language Models to Enhance Programming Error Messages

URL: http://arxiv.org/abs/2210.11630v1
Date: Thu, 20 Oct 2022 23:17:26 GMT
Title: Using Large Language Models to Enhance Programming Error Messages
Authors: Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny, James Prather, Brett A. Becker
Abstract summary: Large language models can be used to create useful enhancements to programming error messages. We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
Score: 5.903720638984496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key part of learning to program is learning to understand programming error messages. They can be hard to interpret and identifying the cause of errors can be time-consuming. One factor in this challenge is that the messages are typically intended for an audience that already knows how to program, or even for programming environments that then use the information to highlight areas in code. Researchers have been working on making these errors more novice friendly since the 1960s, however progress has been slow. The present work contributes to this stream of research by using large language models to enhance programming error messages with explanations of the errors and suggestions on how to fix the error. Large language models can be used to create useful and novice-friendly enhancements to programming error messages that sometimes surpass the original programming error messages in interpretability and actionability. These results provide further evidence of the benefits of large language models for computing educators, highlighting their use in areas known to be challenging for students. We further discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.

Related papers

NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging [68.42255321759062]
Recent advancements in large language models (LLMs) have shifted attention toward leveraging natural language reasoning to enhance code-related tasks.<n>In this paper, we introduce NL-GING, a novel framework that employs natural language as an intermediate representation to improve code.
arXiv Detail & Related papers (2025-05-21T10:38:50Z)
From Bugs to Breakthroughs: Novice Errors in CS2 [1.0609815608017066]
We conducted a longitudinal study of errors that students of a CS2 course made in subsequent programming assignments. We manually categorized 710 errors based on a modified version of an established error framework. Students have only little trouble with learning the programming language, but need more time to understand and express concepts in a programming language.
arXiv Detail & Related papers (2025-02-20T10:41:44Z)
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z)
Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness [0.9014547127329643]
We show how GPT-3.5 is prompted for error explanations on just the erroneous source code itself. We report the baseline results of how effective the error explanations are at providing feedback.
arXiv Detail & Related papers (2025-01-10T04:32:19Z)
Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations [28.829745991874816]
We find that the program representation has a significant influence on the users' accuracy at finding and fixing bugs.<n>Different hints help differently depending on the program representation and the user's understanding of the algorithmic task.<n>These findings have implications for designing next-generation programming tools that provide personalized support to users.
arXiv Detail & Related papers (2024-12-17T02:11:53Z)
Multi-Task Program Error Repair and Explanatory Diagnosis [28.711745671275477]
We present a novel machine-learning approach for Multi-task Program Error Repair and Explanatory Diagnosis (mPRED) A pre-trained language model is used to encode the source code, and a downstream model is specifically designed to identify and repair errors. To aid in visualizing and analyzing the program structure, we use a graph neural network for program structure visualization.
arXiv Detail & Related papers (2024-10-09T05:09:24Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message. 'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler. We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z)
How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool? [1.2990666399718034]
We describe our experience of using CLARA, an automated repair tool, to provide feedback to novices. First, we extended CLARA to support a larger subset of the Python language, before integrating it with the Jupyter Notebooks used for our programming exercises. We found that novices often struggled to understand the proposed repairs, echoing the well-known challenge to understand compiler/interpreter messages.
arXiv Detail & Related papers (2023-10-02T07:45:56Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs. Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z)
Static Prediction of Runtime Errors by Learning to Execute Programs with External Resource Descriptions [31.46148643917194]
We introduce a real-world dataset and task for predicting runtime errors. We develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions. We show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error.
arXiv Detail & Related papers (2022-03-07T23:17:17Z)
On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.