Related papers: Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades

Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades

URL: http://arxiv.org/abs/2404.05988v1
Date: Tue, 9 Apr 2024 03:45:15 GMT
Title: Comparison of Three Programming Error Measures for Explaining Variability in CS1 Grades
Authors: Valdemar Švábenský, Maciej Pankiewicz, Jiayi Zhang, Elizabeth B. Cloude, Ryan S. Baker, Eric Fouh,
Abstract summary: This study examined the relationships between students' rate of programming errors and their grades on two exams. Data were collected from 280 students in a Java programming course.
Score: 11.799817851619757
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Programming courses can be challenging for first year university students, especially for those without prior coding experience. Students initially struggle with code syntax, but as more advanced topics are introduced across a semester, the difficulty in learning to program shifts to learning computational thinking (e.g., debugging strategies). This study examined the relationships between students' rate of programming errors and their grades on two exams. Using an online integrated development environment, data were collected from 280 students in a Java programming course. The course had two parts. The first focused on introductory procedural programming and culminated with exam 1, while the second part covered more complex topics and object-oriented programming and ended with exam 2. To measure students' programming abilities, 51095 code snapshots were collected from students while they completed assignments that were autograded based on unit tests. Compiler and runtime errors were extracted from the snapshots, and three measures -- Error Count, Error Quotient and Repeated Error Density -- were explored to identify the best measure explaining variability in exam grades. Models utilizing Error Quotient outperformed the models using the other two measures, in terms of the explained variability in grades and Bayesian Information Criterion. Compiler errors were significant predictors of exam 1 grades but not exam 2 grades; only runtime errors significantly predicted exam 2 grades. The findings indicate that leveraging Error Quotient with multiple error types (compiler and runtime) may be a better measure of students' introductory programming abilities, though still not explaining most of the observed variability.

Related papers

From Bugs to Breakthroughs: Novice Errors in CS2 [1.0609815608017066]
We conducted a longitudinal study of errors that students of a CS2 course made in subsequent programming assignments. We manually categorized 710 errors based on a modified version of an established error framework. Students have only little trouble with learning the programming language, but need more time to understand and express concepts in a programming language.
arXiv Detail & Related papers (2025-02-20T10:41:44Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Happy: A Debiased Learning Framework for Continual Generalized Category Discovery [54.54153155039062]
This paper explores the underexplored task of Continual Generalized Category Discovery (C-GCD) C-GCD aims to incrementally discover new classes from unlabeled data while maintaining the ability to recognize previously learned classes. We introduce a debiased learning framework, namely Happy, characterized by Hardness-aware prototype sampling and soft entropy regularization.
arXiv Detail & Related papers (2024-10-09T04:18:51Z)
Integrating Natural Language Prompting Tasks in Introductory Programming Courses [3.907735250728617]
This report explores the inclusion of two prompt-focused activities in an introductory programming course. The first requires students to solve computational problems by writing natural language prompts, emphasizing problem-solving over syntax. The second involves students crafting prompts to generate code equivalent to provided fragments, to foster an understanding of the relationship between prompts and code.
arXiv Detail & Related papers (2024-10-04T01:03:25Z)
SimGrade: Using Code Similarity Measures for More Accurate Human Grading [5.797317782326566]
We show that inaccurate and inconsistent grading of free-response programming problems is widespread in CS1 courses. We propose several algorithms for assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution.
arXiv Detail & Related papers (2024-02-19T23:06:23Z)
Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation. We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z)
Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code [0.0]
We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses.
arXiv Detail & Related papers (2023-03-09T16:52:12Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
The impact of students behaviour, their approach, emotions and problem difficulty level on the performance prediction, evaluation and overall learning process during online coding activities [0.0]
Two online coding assignments or competitions are conducted with a 1-hour time limit. A survey has been conducted at the end of each coding test and answers to different questions have been collected. Two coding assignments or competitions are analyzed through in-depth research on 229 (first coding competition dataset) and 325 (second coding competition dataset) data points.
arXiv Detail & Related papers (2021-12-29T06:11:01Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
Students Struggle to Explain Their Own Program Code [0.0]
We ask students to explain the structure and execution of their small programs after they submit them to a programming exercise. One third of the students struggled to explain their own program code. Our results indicate that answering properly aligned QLCs correctly has stronger correlation with student success and retention than merely submitting a correct program.
arXiv Detail & Related papers (2021-04-14T09:13:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.