Related papers: SimGrade: Using Code Similarity Measures for More Accurate Human Grading

SimGrade: Using Code Similarity Measures for More Accurate Human Grading

URL: http://arxiv.org/abs/2403.14637v1
Date: Mon, 19 Feb 2024 23:06:23 GMT
Title: SimGrade: Using Code Similarity Measures for More Accurate Human Grading
Authors: Sonja Johnson-Yu, Nicholas Bowman, Mehran Sahami, Chris Piech,
Abstract summary: We show that inaccurate and inconsistent grading of free-response programming problems is widespread in CS1 courses. We propose several algorithms for assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution.
Score: 5.797317782326566
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While the use of programming problems on exams is a common form of summative assessment in CS courses, grading such exam problems can be a difficult and inconsistent process. Through an analysis of historical grading patterns we show that inaccurate and inconsistent grading of free-response programming problems is widespread in CS1 courses. These inconsistencies necessitate the development of methods to ensure more fairer and more accurate grading. In subsequent analysis of this historical exam data we demonstrate that graders are able to more accurately assign a score to a student submission when they have previously seen another submission similar to it. As a result, we hypothesize that we can improve exam grading accuracy by ensuring that each submission that a grader sees is similar to at least one submission they have previously seen. We propose several algorithms for (1) assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution, leveraging distributed representations of student code in order to measure similarity between submissions. Finally, we demonstrate in simulation that these algorithms achieve higher grading accuracy than the current standard random assignment process used for grading.

Related papers

CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [56.17331530444765]
CPRet is a retrieval-oriented benchmark suite for competitive programming.<n>It covers four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full)<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
CASET: Complexity Analysis using Simple Execution Traces for CS* submissions [0.0]
The most common method to auto-grade a student's submission in a CS1 or a CS2 course is to run it against a pre-defined test suite and compare the results against reference results. This technique cannot be used if the correctness of the solution goes beyond simple output, such as the algorithm used to obtain the result. We propose CASET, a novel tool to analyze the time complexity of algorithms using dynamic traces and unsupervised machine learning.
arXiv Detail & Related papers (2024-10-20T15:29:50Z)
Computer Aided Design and Grading for an Electronic Functional Programming Exam [0.0]
We introduce an algorithm to check Proof Puzzles based on finding correct sequences of proof lines that improves fairness compared to an existing, edit distance based algorithm. A higher-level language and open-source tool to specify regular expressions makes creating complex regular expressions less error-prone. We evaluate the resulting e-exam by analyzing the degree of automation in the grading process, asking students for their opinion, and critically reviewing our own experiences.
arXiv Detail & Related papers (2023-08-14T07:08:09Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Better Peer Grading through Bayesian Inference [13.113568233352986]
Peer grading systems aggregate noisy reports from multiple students to approximate a true grade as closely as possible. This paper extends the state of the art in three key ways: (1) recognizing that students can behave strategically; (2) appropriately handling censored data that arises from discrete-valued grading rubrics; and (3) using mixed integer programming to improve the interpretability of the grades assigned to students.
arXiv Detail & Related papers (2022-09-02T19:10:53Z)
Label Matching Semi-Supervised Object Detection [85.99282969977541]
Semi-supervised object detection has made significant progress with the development of mean teacher driven self-training. Label mismatch problem is not yet fully explored in the previous works, leading to severe confirmation bias during self-training. We propose a simple yet effective LabelMatch framework from two different yet complementary perspectives.
arXiv Detail & Related papers (2022-06-14T05:59:41Z)
Modeling and Correcting Bias in Sequential Evaluation [10.852140754372193]
We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied sequential bias in such settings, we propose a natural model for the evaluator's rating process. We conduct crowdsourcing experiments to demonstrate various facets of our model.
arXiv Detail & Related papers (2022-05-03T16:38:13Z)
Deep Probabilistic Graph Matching [72.6690550634166]
We propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints. The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k) and it outperforms all previous state-of-the-arts on all benchmarks.
arXiv Detail & Related papers (2022-01-05T13:37:27Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence. Traditional learning process of seq2seq models suffers from two problems. We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
An end-to-end approach for the verification problem: learning the right distance [15.553424028461885]
We augment the metric learning setting by introducing a parametric pseudo-distance, trained jointly with the encoder. We first show it approximates a likelihood ratio which can be used for hypothesis tests. We observe training is much simplified under the proposed approach compared to metric learning with actual distances.
arXiv Detail & Related papers (2020-02-21T18:46:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.