Automated Grading and Feedback Tools for Programming Education: A
Systematic Review
- URL: http://arxiv.org/abs/2306.11722v2
- Date: Wed, 6 Dec 2023 00:46:58 GMT
- Title: Automated Grading and Feedback Tools for Programming Education: A
Systematic Review
- Authors: Marcus Messer, Neil C. C. Brown, Michael K\"olling, Miaojing Shi
- Abstract summary: Most papers assess the correctness of assignments in object-oriented languages.
Few tools assess the maintainability, readability or documentation of the source code.
Most tools offered fully automated assessment to allow for near-instantaneous feedback.
- Score: 7.776434991976473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We conducted a systematic literature review on automated grading and feedback
tools for programming education.
We analysed 121 research papers from 2017 to 2021 inclusive and categorised
them based on skills assessed, approach, language paradigm, degree of
automation and evaluation techniques.
Most papers assess the correctness of assignments in object-oriented
languages.
Typically, these tools use a dynamic technique, primarily unit testing, to
provide grades and feedback to the students or static analysis techniques to
compare a submission with a reference solution or with a set of correct student
submissions.
However, these techniques' feedback is often limited to whether the unit
tests have passed or failed, the expected and actual output, or how they differ
from the reference solution.
Furthermore, few tools assess the maintainability, readability or
documentation of the source code, with most using static analysis techniques,
such as code quality metrics, in conjunction with grading correctness.
Additionally, we found that most tools offered fully automated assessment to
allow for near-instantaneous feedback and multiple resubmissions, which can
increase student satisfaction and provide them with more opportunities to
succeed.
In terms of techniques used to evaluate the tools' performance, most papers
primarily use student surveys or compare the automatic assessment tools to
grades or feedback provided by human graders.
However, because the evaluation dataset is frequently unavailable, it is more
difficult to reproduce results and compare tools to a collection of common
assignments.
Related papers
- TOOLVERIFIER: Generalization to New Tools via Self-Verification [69.85190990517184]
We introduce a self-verification method which distinguishes between close candidates by self-asking contrastive questions during tool selection.
Experiments on 4 tasks from the ToolBench benchmark, consisting of 17 unseen tools, demonstrate an average improvement of 22% over few-shot baselines.
arXiv Detail & Related papers (2024-02-21T22:41:38Z) - Improving Automated Code Reviews: Learning from Experience [12.573740138977065]
This study investigates whether higher-quality reviews can be generated from automated code review models.
We find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews.
arXiv Detail & Related papers (2024-02-06T07:48:22Z) - Code Review Automation: Strengths and Weaknesses of the State of the Art [14.313783664862923]
Three code review automation techniques tend to succeed or fail in two tasks described in this paper.
The study has a strong qualitative focus, with 105 man-hours of manual inspection invested in analyzing correct and wrong predictions.
arXiv Detail & Related papers (2024-01-10T13:00:18Z) - Analyzing Dataset Annotation Quality Management in the Wild [63.07224587146207]
Even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, biases, or artifacts.
While practices and guidelines regarding dataset creation projects exist, large-scale analysis has yet to be performed on how quality management is conducted.
arXiv Detail & Related papers (2023-07-16T21:22:40Z) - A survey on grading format of automated grading tools for programming
assignments [0.0]
The prevalence of online platforms and studies has generated the demand for automated grading tools.
This survey studies and evaluates the automated grading tools based on their evaluation format.
arXiv Detail & Related papers (2022-12-04T00:49:16Z) - ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
Visual Models [102.63817106363597]
We build ELEVATER, the first benchmark to compare and evaluate pre-trained language-augmented visual models.
It consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge.
We will release our toolkit and evaluation platforms for the research community.
arXiv Detail & Related papers (2022-04-19T10:23:42Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification.
A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors.
Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Open Source Software for Efficient and Transparent Reviews [0.11179881480027788]
ASReview is an open source machine learning-aided pipeline applying active learning.
We demonstrate by means of simulation studies that ASReview can yield far more efficient reviewing than manual reviewing.
arXiv Detail & Related papers (2020-06-22T11:57:10Z) - Automated Content Grading Using Machine Learning [0.0]
This research project is a primitive experiment in the automation of grading of theoretical answers written in exams by students in technical courses.
We show how the algorithmic approach in machine learning can be used to automatically examine and grade theoretical content in exam answer papers.
arXiv Detail & Related papers (2020-04-08T23:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.