Related papers: BugSpotter: Automated Generation of Code Debugging Exercises

BugSpotter: Automated Generation of Code Debugging Exercises

URL: http://arxiv.org/abs/2411.14303v2
Date: Mon, 25 Nov 2024 08:31:00 GMT
Title: BugSpotter: Automated Generation of Code Debugging Exercises
Authors: Victor-Alexandru Pădurean, Paul Denny, Adish Singla,
Abstract summary: This paper introduces BugSpotter, a tool to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification.
Score: 22.204802715829615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging [68.42255321759062]
Recent advancements in large language models (LLMs) have shifted attention toward leveraging natural language reasoning to enhance code-related tasks.<n>In this paper, we introduce NL-GING, a novel framework that employs natural language as an intermediate representation to improve code.
arXiv Detail & Related papers (2025-05-21T10:38:50Z)
Learning Code-Edit Embedding to Model Student Debugging Behavior [2.1485350418225244]
We propose an encoder-decoder-based model that learns meaningful code-edit embeddings between consecutive student code submissions. It enables personalized next-step code suggestions that maintain the student's coding style while improving test case correctness.
arXiv Detail & Related papers (2025-02-26T18:54:39Z)
Effective Large Language Model Debugging with Best-first Tree Search [27.68711322875045]
Large Language Models (LLMs) show promise in code generation tasks. LLMs cannot consistently spot and fix bugs. We propose an algorithm to enable LLMs to debug their code via self-reflection and search where a model attempts to identify its previous mistakes.
arXiv Detail & Related papers (2024-07-26T19:26:00Z)
A Proposal for a Debugging Learning Support Environment for Undergraduate Students Majoring in Computer Science [0.0]
Students do not know how to use a debugger or have never used one. We implemented a function in Scratch that allows for self-learning of correct breakpoint placement.
arXiv Detail & Related papers (2024-07-25T03:34:19Z)
Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging [27.70379206820154]
Socratic questioning is an effective teaching strategy, encouraging critical thinking and problem-solving. TreeInstruct asks probing questions to help students independently identify and resolve errors. It estimates a student's conceptual and syntactical knowledge to dynamically construct a question tree based on their responses and current knowledge state.
arXiv Detail & Related papers (2024-06-17T16:28:21Z)
Leveraging Print Debugging to Improve Code Generation in Large Language Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks. But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal. We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging [28.321080454393687]
Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. Human novices play the role of Teaching Assistants and help LLM-powered teachable agents code. We introduce Hypo, a novel system to facilitate deliberate practice on debug, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
arXiv Detail & Related papers (2023-10-08T21:39:47Z)
NuzzleBug: Debugging Block-Based Programs in Scratch [11.182625995483862]
NuzzleBug is an extension of the popular block-based programming environment Scratch. It is an interrogative debugger that enables to ask questions about executions and provides answers. We find that teachers consider NuzzleBug to be useful, and children can use it to debug faulty programs effectively.
arXiv Detail & Related papers (2023-09-25T18:56:26Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.