Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students
- URL: http://arxiv.org/abs/2410.09193v1
- Date: Fri, 11 Oct 2024 18:51:58 GMT
- Title: Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students
- Authors: Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hellas, Xandria Crosland,
- Abstract summary: Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data.
This research paper examines the distribution of bugs generated by LLMs in contrast to those produced by computing students.
- Score: 4.949067768845775
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to privacy reasons. In this research paper, we conduct a comparative study examining the distribution of bugs generated by LLMs in contrast to those produced by computing students. Leveraging data from two previous large-scale analyses of student-generated bugs, we investigate whether LLMs can be coaxed to exhibit bug patterns that are similar to authentic student bugs when prompted to inject errors into code. The results suggest that unguided, LLMs do not generate plausible error distributions, and many of the generated errors are unlikely to be generated by real students. However, with guidance including descriptions of common errors and typical frequencies, LLMs can be shepherded to generate realistic distributions of errors in synthetic code.
Related papers
- Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.
In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z) - Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework [64.83955753606443]
Math Word Problems serve as a crucial benchmark for evaluating Large Language Models' reasoning abilities.
Current error classification methods rely on static and predefined categories.
We introduce MWPES-300K, a comprehensive dataset containing 304,865 error samples.
arXiv Detail & Related papers (2025-01-26T16:17:57Z) - LLM-itation is the Sincerest Form of Data: Generating Synthetic Buggy Code Submissions for Computing Education [5.421088637597145]
Large language models (LLMs) offer a promising approach to create large-scale, privacy-preserving synthetic data.
This work explores generating synthetic buggy code submissions for introductory programming exercises using GPT-4o.
We compare the distribution of test case failures between synthetic and real student data from two courses to analyze the accuracy of the synthetic data in mimicking real student data.
arXiv Detail & Related papers (2024-11-01T00:24:59Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations [46.351064535592336]
Large language models (LLMs) often produce errors, including factual inaccuracies, biases, and reasoning failures.
Recent studies have demonstrated that LLMs' internal states encode information regarding the truthfulness of their outputs.
We show that the internal representations of LLMs encode much more information about truthfulness than previously recognized.
arXiv Detail & Related papers (2024-10-03T17:31:31Z) - Case2Code: Scalable Synthetic Data for Code Generation [105.89741089673575]
Large Language Models (LLMs) have shown outstanding breakthroughs in code generation.
Recent work improves code LLMs by training on synthetic data generated by some powerful LLMs.
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models [10.519984835232359]
Large Language Models (LLMs) have demonstrated unprecedented capabilities in code generation.
We conducted an in-depth analysis of code generation errors across six representative LLMs on the HumanEval dataset.
Our findings highlighted several challenges in locating and fixing code generation errors made by LLMs.
arXiv Detail & Related papers (2024-06-13T01:29:52Z) - Improving LLM Classification of Logical Errors by Integrating Error Relationship into Prompts [1.7095867620640115]
A key aspect of programming education is understanding and dealing with error message.
'logical errors' in which the program operates against the programmer's intentions do not receive error messages from the compiler.
We propose an effective approach for detecting logical errors with LLMs that makes use of relations among error types in the Chain-of-Thought and Tree-of-Thought prompts.
arXiv Detail & Related papers (2024-04-30T08:03:22Z) - Decoding Logic Errors: A Comparative Study on Bug Detection by Students
and Large Language Models [5.162225137921625]
Large language models (LLMs) have recently demonstrated surprising performance for a range of computing tasks.
We investigate the performance of two popular LLMs, GPT-3 and GPT-4, for detecting and providing a novice-friendly explanation of logic errors.
arXiv Detail & Related papers (2023-11-27T17:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.