Using Large Language Models to Enhance Programming Error Messages
- URL: http://arxiv.org/abs/2210.11630v1
- Date: Thu, 20 Oct 2022 23:17:26 GMT
- Title: Using Large Language Models to Enhance Programming Error Messages
- Authors: Juho Leinonen, Arto Hellas, Sami Sarsa, Brent Reeves, Paul Denny,
James Prather, Brett A. Becker
- Abstract summary: Large language models can be used to create useful enhancements to programming error messages.
We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
- Score: 5.903720638984496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key part of learning to program is learning to understand programming error
messages. They can be hard to interpret and identifying the cause of errors can
be time-consuming. One factor in this challenge is that the messages are
typically intended for an audience that already knows how to program, or even
for programming environments that then use the information to highlight areas
in code. Researchers have been working on making these errors more novice
friendly since the 1960s, however progress has been slow. The present work
contributes to this stream of research by using large language models to
enhance programming error messages with explanations of the errors and
suggestions on how to fix the error. Large language models can be used to
create useful and novice-friendly enhancements to programming error messages
that sometimes surpass the original programming error messages in
interpretability and actionability. These results provide further evidence of
the benefits of large language models for computing educators, highlighting
their use in areas known to be challenging for students. We further discuss the
benefits and downsides of large language models and highlight future streams of
research for enhancing programming error messages.
Related papers
- DistiLRR: Transferring Code Repair for Low-Resource Programming Languages [57.62712191540067]
Distilling Low-Resource Repairs (DistiLRR) is an approach that transfers the reasoning and code generation ability from a teacher model to a student model.
Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages.
arXiv Detail & Related papers (2024-06-21T05:05:39Z) - VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step.
V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy.
Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z) - CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation [60.799992690487336]
We propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks.
CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation.
arXiv Detail & Related papers (2024-05-03T02:48:55Z) - Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message.
'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler.
We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z) - How Helpful do Novice Programmers Find the Feedback of an Automated
Repair Tool? [1.2990666399718034]
We describe our experience of using CLARA, an automated repair tool, to provide feedback to novices.
First, we extended CLARA to support a larger subset of the Python language, before integrating it with the Jupyter Notebooks used for our programming exercises.
We found that novices often struggled to understand the proposed repairs, echoing the well-known challenge to understand compiler/interpreter messages.
arXiv Detail & Related papers (2023-10-02T07:45:56Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - Dcc --help: Generating Context-Aware Compiler Error Explanations with
Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks.
We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z) - Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black
Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs.
Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z) - What is it like to program with artificial intelligence? [10.343988028594612]
Large language models can generate code to solve a variety of problems expressed in natural language.
This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot.
We explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance.
arXiv Detail & Related papers (2022-08-12T10:48:46Z) - Static Prediction of Runtime Errors by Learning to Execute Programs with
External Resource Descriptions [31.46148643917194]
We introduce a real-world dataset and task for predicting runtime errors.
We develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions.
We show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error.
arXiv Detail & Related papers (2022-03-07T23:17:17Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.