Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades
- URL: http://arxiv.org/abs/2404.05988v1
- Date: Tue, 9 Apr 2024 03:45:15 GMT
- Title: Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades
- Authors: Valdemar Švábenský, Maciej Pankiewicz, Jiayi Zhang, Elizabeth B. Cloude, Ryan S. Baker, Eric Fouh,
- Abstract summary: This study examined the relationships between students' rate of programming errors and their grades on two exams.
Data were collected from 280 students in a Java programming course.
- Score: 11.799817851619757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the difficulty in learning to program shifts to learning computational thinking (e.g., debugging strategies). This study examined the relationships between students' rate of programming errors and their grades on two exams. Using an online integrated development environment, data were collected from 280 students in a Java programming course. The course had two parts. The first focused on introductory procedural programming and culminated with exam 1, while the second part covered more complex topics and object-oriented programming and ended with exam 2. To measure students' programming abilities, 51095 code snapshots were collected from students while they completed assignments that were autograded based on unit tests. Compiler and runtime errors were extracted from the snapshots, and three measures -- Error Count, Error Quotient and Repeated Error Density -- were explored to identify the best measure explaining variability in exam grades. Models utilizing Error Quotient outperformed the models using the other two measures, in terms of the explained variability in grades and Bayesian Information Criterion. Compiler errors were significant predictors of exam 1 grades but not exam 2 grades; only runtime errors significantly predicted exam 2 grades. The findings indicate that leveraging Error Quotient with multiple error types (compiler and runtime) may be a better measure of students' introductory programming abilities, though still not explaining most of the observed variability.
Related papers
- From Bugs to Breakthroughs: Novice Errors in CS2 [1.0609815608017066]
We conducted a longitudinal study of errors that students of a CS2 course made in subsequent programming assignments.
We manually categorized 710 errors based on a modified version of an established error framework.
Students have only little trouble with learning the programming language, but need more time to understand and express concepts in a programming language.
arXiv Detail & Related papers (2025-02-20T10:41:44Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Integrating Natural Language Prompting Tasks in Introductory Programming Courses [3.907735250728617]
This report explores the inclusion of two prompt-focused activities in an introductory programming course.
The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax.
The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code.
arXiv Detail & Related papers (2024-10-04T01:03:25Z) - SimGrade: Using Code Similarity Measures for More Accurate Human Grading [5.797317782326566]
We show that inaccurate and inconsistent grading of free-response programming problems is widespread in CS1 courses.
We propose several algorithms for assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution.
arXiv Detail & Related papers (2024-02-19T23:06:23Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions
about Code [0.0]
We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments.
These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses.
arXiv Detail & Related papers (2023-03-09T16:52:12Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z) - Students Struggle to Explain Their Own Program Code [0.0]
We ask students to explain the structure and execution of their small programs after they submit them to a programming exercise.
One third of the students struggled to explain their own program code.
Our results indicate that answering properly aligned QLCs correctly has stronger correlation with student success and retention than merely submitting a correct program.
arXiv Detail & Related papers (2021-04-14T09:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.