Related papers: How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

URL: http://arxiv.org/abs/2310.05292v5
Date: Thu, 10 Oct 2024 16:29:59 GMT
Title: How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging
Authors: Qianou Ma, Hua Shen, Kenneth Koedinger, Tongshuang Wu,
Abstract summary: Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. Human novices play the role of Teaching Assistants and help LLM-powered teachable agents code. We introduce Hypo, a novel system to facilitate deliberate practice on debug, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents code.
Score: 28.321080454393687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.

Related papers

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs. LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data. Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z)
BugSpotter: Automated Generation of Code Debugging Exercises [22.204802715829615]
This paper introduces BugSpotter, a tool to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification.
arXiv Detail & Related papers (2024-11-21T16:56:33Z)
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation [0.0]
27 recent papers have been reviewed and split into two groups. The first group consists of new methods for bug detection and repair, which include locating semantic errors. The second group dwells on code generation, providing an overview of both general-purpose LLMs fine-tuned for programming and task-specific models. It also presents methods to improve code generation, such as identifier-aware training, fine-tuning at the instruction level, and incorporating semantic code structures.
arXiv Detail & Related papers (2024-11-12T06:47:54Z)
Impeding LLM-assisted Cheating in Introductory Programming Assignments via Adversarial Perturbation [42.49889252988544]
Large language model (LLM)-based programming assistants can help improve the productivity of professional software developers, but can also facilitate cheating in introductory computer programming courses. This paper investigates the baseline performance of 5 widely used LLMs on a collection of introductory programming problems, examines adversarial perturbations to degrade their performance, and describes the results of a user study aimed at understanding the efficacy of such perturbations in hindering actual code generation for introductory programming assignments.
arXiv Detail & Related papers (2024-10-12T01:01:00Z)
zsLLMCode: An Effective Approach for Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
This paper proposes a novel zero-shot approach, zsLLMCode, to generate code embeddings by using large language models (LLMs) and sentence embedding models. The results have demonstrated the effectiveness and superiority of our method over state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2024-09-23T01:03:15Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation. Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs. We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement [93.38736019287224]
"LLMs-as-Instructors" framework autonomously enhances the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model. Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors.
arXiv Detail & Related papers (2024-06-29T17:16:04Z)
An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project [1.433758865948252]
Large Language Models (LLMs) represent a leap in artificial intelligence, excelling in tasks using human language(s) In this paper, we analyze the AI-generated code, prompts used for code generation, and the human intervention levels to integrate the code into the code base. Our findings suggest that LLMs can play a crucial role in the early stages of software development.
arXiv Detail & Related papers (2024-01-29T14:32:32Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Testing LLMs on Code Generation with Varying Levels of Prompt Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z)
Democratizing Reasoning Ability: Tailored Learning from Large Language Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs. We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.