Related papers: Automatic Assessment of the Design Quality of Python Programs with Personalized Feedback

Automatic Assessment of the Design Quality of Python Programs with Personalized Feedback

URL: http://arxiv.org/abs/2106.01399v1
Date: Wed, 2 Jun 2021 18:04:53 GMT
Title: Automatic Assessment of the Design Quality of Python Programs with Personalized Feedback
Authors: J. Walker Orr, Nathaniel Russell
Abstract summary: We propose a neural network model to assess the design of a program and provide personalized feedback to guide students on how to make corrections. The model's effectiveness is evaluated on a corpus of student programs written in Python. Students who participated in the study improved their program design scores by 19.58%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The assessment of program functionality can generally be accomplished with straight-forward unit tests. However, assessing the design quality of a program is a much more difficult and nuanced problem. Design quality is an important consideration since it affects the readability and maintainability of programs. Assessing design quality and giving personalized feedback is very time consuming task for instructors and teaching assistants. This limits the scale of giving personalized feedback to small class settings. Further, design quality is nuanced and is difficult to concisely express as a set of rules. For these reasons, we propose a neural network model to both automatically assess the design of a program and provide personalized feedback to guide students on how to make corrections. The model's effectiveness is evaluated on a corpus of student programs written in Python. The model has an accuracy rate from 83.67% to 94.27%, depending on the dataset, when predicting design scores as compared to historical instructor assessment. Finally, we present a study where students tried to improve the design of their programs based on the personalized feedback produced by the model. Students who participated in the study improved their program design scores by 19.58%.

Related papers

Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias [52.590072198551944]
The aim of image personalization is to create images based on a user-provided subject. Current methods face challenges in ensuring fidelity to the text prompt. We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
arXiv Detail & Related papers (2025-03-09T14:14:02Z)
HREF: Human Response-Guided Evaluation of Instruction Following in Language Models [61.273153125847166]
We develop a new evaluation benchmark, Human Response-Guided Evaluation of Instruction Following (HREF) In addition to providing reliable evaluation, HREF emphasizes individual task performance and is free from contamination. We study the impact of key design choices in HREF, including the size of the evaluation set, the judge model, the baseline model, and the prompt template.
arXiv Detail & Related papers (2024-12-20T03:26:47Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
Fuzzy Intelligent System for Student Software Project Evaluation [0.0]
This paper introduces a fuzzy intelligent system designed to evaluate academic software projects. The system processes the input criteria to produce a quantifiable measure of project success. Our approach standardizes project evaluations and helps to reduce the subjective bias in manual grading.
arXiv Detail & Related papers (2024-05-01T11:12:22Z)
QuRating: Selecting High-Quality Data for Training Language Models [64.83332850645074]
We introduce QuRating, a method for selecting pre-training data that can capture human intuitions about data quality. In this paper, we investigate four qualities - writing style, required expertise, facts & trivia, and educational value. We train a Qur model to learn scalar ratings from pairwise judgments, and use it to annotate a 260B training corpus with quality ratings for each of the four criteria.
arXiv Detail & Related papers (2024-02-15T06:36:07Z)
Learning and Evaluating Human Preferences for Conversational Head Generation [101.89332968344102]
We propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation.
arXiv Detail & Related papers (2023-07-20T07:04:16Z)
Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. Standard approaches require instructors to manually grade student-implemented interactive programs. Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z)
Automatic Assessment of the Design Quality of Student Python and Java Programs [0.0]
We propose a rule-based system that assesses student programs for quality of design of and provides personalized, precise feedback on how to improve their work. The students benefited from the system and the rate of design quality flaws dropped 47.84% on average over 4 different assignments, 2 in Python and 2 in Java, in comparison to the previous 2 to 3 years of student submissions.
arXiv Detail & Related papers (2022-08-22T06:04:10Z)
A Multicriteria Evaluation for Data-Driven Programming Feedback Systems: Accuracy, Effectiveness, Fallibility, and Students' Response [7.167352606079407]
Data-driven programming feedback systems can help novices to program in the absence of a human tutor. Prior evaluations showed that these systems improve learning in terms of test scores, or task completion efficiency. These aspects include inherent fallibility of current state-of-the-art, students' programming behavior in response to correct/incorrect feedback, and effective/ineffective system components.
arXiv Detail & Related papers (2022-07-27T00:29:32Z)
An Analysis of Programming Course Evaluations Before and After the Introduction of an Autograder [1.329950749508442]
This paper studies the answers to the standardized university evaluation questionnaires of foundational computer science courses which recently introduced autograding. We hypothesize how the autograder might have contributed to the significant changes in the data, such as, improved interactions between tutors and students, improved overall course quality, improved learning success, increased time spent, and reduced difficulty. The autograder technology can be validated as a teaching method to improve student satisfaction with programming courses.
arXiv Detail & Related papers (2021-10-28T14:09:44Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Graph-based, Self-Supervised Program Repair from Diagnostic Feedback [108.48853808418725]
We introduce a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback. We then apply a graph neural network on top to model the reasoning process. We present a self-supervised learning paradigm for program repair that leverages unlabeled programs available online.
arXiv Detail & Related papers (2020-05-20T07:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.