Dcc --help: Generating Context-Aware Compiler Error Explanations with
Large Language Models
- URL: http://arxiv.org/abs/2308.11873v2
- Date: Mon, 16 Oct 2023 03:05:35 GMT
- Title: Dcc --help: Generating Context-Aware Compiler Error Explanations with
Large Language Models
- Authors: Andrew Taylor and Alexandra Vassar and Jake Renzella and Hammond
Pearce
- Abstract summary: dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks.
We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
- Score: 53.04357141450459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the challenging field of introductory programming, high enrollments and
failure rates drive us to explore tools and systems to enhance student
outcomes, especially automated tools that scale to large cohorts. This paper
presents and evaluates the dcc --help tool, an integration of a Large Language
Model (LLM) into the Debugging C Compiler (DCC) to generate unique,
novice-focused explanations tailored to each error. dcc --help prompts an LLM
with contextual information of compile- and run-time error occurrences,
including the source code, error location and standard compiler error message.
The LLM is instructed to generate novice-focused, actionable error explanations
and guidance, designed to help students understand and resolve problems without
providing solutions. dcc --help was deployed to our CS1 and CS2 courses, with
2,565 students using the tool over 64,000 times in ten weeks. We analysed a
subset of these error/explanation pairs to evaluate their properties, including
conceptual correctness, relevancy, and overall quality. We found that the
LLM-generated explanations were conceptually accurate in 90% of compile-time
and 75% of run-time cases, but often disregarded the instruction not to provide
solutions in code. Our findings, observations and reflections following
deployment indicate that dcc-help provides novel opportunities for scaffolding
students' introduction to programming.
Related papers
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce Bench, a benchmark that challenges Large Language Models to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message.
'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler.
We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z) - Explaining EDA synthesis errors with LLMs [10.665347817363623]
Large Language Models (LLMs) have demonstrated text comprehension and question-answering capabilities.
We generate 936 error message explanations using three OpenAI LLMs over 21 different buggy code samples.
These are then graded for relevance and correctness, and we find that in approximately 71% of cases the LLMs give correct & complete explanations suitable for novice learners.
arXiv Detail & Related papers (2024-04-07T07:12:16Z) - Patterns of Student Help-Seeking When Using a Large Language
Model-Powered Programming Assistant [2.5949084781328744]
This study examines students' use of an innovative tool that provides on-demand programming assistance without revealing solutions directly.
We collected more than 2,500 queries submitted by students throughout the term.
We found that most queries requested immediate help with programming assignments, whereas fewer requests asked for help on related concepts or for deepening conceptual understanding.
arXiv Detail & Related papers (2023-10-25T20:36:05Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - CodeHelp: Using Large Language Models with Guardrails for Scalable
Support in Programming Classes [2.5949084781328744]
Large language models (LLMs) have emerged recently and show great promise for providing on-demand help at a large scale.
We introduce CodeHelp, a novel LLM-powered tool designed with guardrails to provide on-demand assistance to programming students without directly revealing solutions.
Our findings suggest that CodeHelp is well-received by students who especially value its availability and help with resolving errors, and that for instructors it is easy to deploy and complements, rather than replaces, the support that they provide to students.
arXiv Detail & Related papers (2023-08-14T03:52:24Z) - Evaluating and Improving Tool-Augmented Computation-Intensive Math
Reasoning [75.74103236299477]
Chain-of-thought prompting(CoT) and tool augmentation have been validated as effective practices for improving large language models.
We propose a new approach that can deliberate the reasoning steps with tool interfaces, namely textbfDELI.
Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines.
arXiv Detail & Related papers (2023-06-04T17:02:59Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - SYNFIX: Automatically Fixing Syntax Errors using Compiler Diagnostics [0.0]
Students could be helped, and instructors' time saved, by automated repair suggestions when dealing with syntax errors.
We introduce SYNFIX, a machine-learning based tool that substantially improves on the state-of-the-art.
We have built SYNFIX into a free, open-source version of Visual Studio Code; we make all our source code and models freely available.
arXiv Detail & Related papers (2021-04-29T21:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.