Related papers: A Multicriteria Evaluation for Data-Driven Programming Feedback Systems: Accuracy, Effectiveness, Fallibility, and Students' Response

A Multicriteria Evaluation for Data-Driven Programming Feedback Systems: Accuracy, Effectiveness, Fallibility, and Students' Response

URL: http://arxiv.org/abs/2208.05326v1
Date: Wed, 27 Jul 2022 00:29:32 GMT
Title: A Multicriteria Evaluation for Data-Driven Programming Feedback Systems: Accuracy, Effectiveness, Fallibility, and Students' Response
Authors: Preya Shabrina, Samiha Marwan, Andrew Bennison, Min Chi, Thomas Price, Tiffany Barnes
Abstract summary: Data-driven programming feedback systems can help novices to program in the absence of a human tutor. Prior evaluations showed that these systems improve learning in terms of test scores, or task completion efficiency. These aspects include inherent fallibility of current state-of-the-art, students' programming behavior in response to correct/incorrect feedback, and effective/ineffective system components.
Score: 7.167352606079407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data-driven programming feedback systems can help novices to program in the absence of a human tutor. Prior evaluations showed that these systems improve learning in terms of test scores, or task completion efficiency. However, crucial aspects which can impact learning or reveal insights important for future improvement of such systems are ignored in these evaluations. These aspects include inherent fallibility of current state-of-the-art, students' programming behavior in response to correct/incorrect feedback, and effective/ineffective system components. Consequently, a great deal of knowledge is yet to be discovered about such systems. In this paper, we apply a multi-criteria evaluation with 5 criteria on a data-driven feedback system integrated within a block-based novice programming environment. Each criterion in the evaluation reveals a unique pivotal aspect of the system: 1) How accurate the feedback system is; 2) How it guides students throughout programming tasks; 3) How it helps students in task completion; 4) What happens when it goes wrong; and 5) How students respond generally to the system. Our evaluation results showed that the system was helpful to students due to its effective design and feedback representation despite being fallible. However, novices can be negatively impacted by this fallibility due to high reliance and lack of self-evaluation. The negative impacts include increased working time, implementation, or submission of incorrect/partially correct solutions. The evaluation results reinforced the necessity of multi-criteria system evaluations while revealing important insights helpful to ensuring proper usage of data-driven feedback systems, designing fallibility mitigation steps, and driving research for future improvement.

Related papers

SPHERE: An Evaluation Card for Human-AI Systems [75.0887588648484]
We present an evaluation card SPHERE, which encompasses five key dimensions. We conduct a review of 39 human-AI systems using SPHERE, outlining current evaluation practices and areas for improvement.
arXiv Detail & Related papers (2025-03-24T20:17:20Z)
A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education [0.6141800972050401]
We propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes.
arXiv Detail & Related papers (2025-01-24T08:01:41Z)
Pessimistic Evaluation [58.736490198613154]
We argue that evaluating information access systems assumes utilitarian values not aligned with traditions of information access based on equal access. We advocate for pessimistic evaluation of information access systems focusing on worst case utility.
arXiv Detail & Related papers (2024-10-17T15:40:09Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Adaptation of the Multi-Concept Multivariate Elo Rating System to Medical Students Training Data [6.222836318380985]
Elo rating system is widely recognized for its proficiency in predicting student performance. This paper presents an adaptation of a multi concept variant of the Elo rating system to the data collected by a medical training platform.
arXiv Detail & Related papers (2024-02-26T19:19:56Z)
Identifying Student Profiles Within Online Judge Systems Using Explainable Artificial Intelligence [6.638206014723678]
Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students. This work aims to tackle this limitation by considering the further exploitation of the information gathered by the OJ and automatically inferring feedback for both the student and the instructor.
arXiv Detail & Related papers (2024-01-29T12:11:30Z)
An integrated framework for developing and evaluating an automated lecture style assessment system [0.784125444722239]
The proposed application utilizes specific measurable biometric characteristics, such as facial expressions, body activity, speech rate and intonation, hand movement, and facial pose. Results indicate that participants found the application novel and useful in providing automated feedback regarding lecture quality.
arXiv Detail & Related papers (2023-11-30T21:31:21Z)
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z)
Automatic Assessment of the Design Quality of Student Python and Java Programs [0.0]
We propose a rule-based system that assesses student programs for quality of design of and provides personalized, precise feedback on how to improve their work. The students benefited from the system and the rate of design quality flaws dropped 47.84% on average over 4 different assignments, 2 in Python and 2 in Java, in comparison to the previous 2 to 3 years of student submissions.
arXiv Detail & Related papers (2022-08-22T06:04:10Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
Soliciting Human-in-the-Loop User Feedback for Interactive Machine Learning Reduces User Trust and Impressions of Model Accuracy [8.11839312231511]
Mixed-initiative systems allow users to interactively provide feedback to improve system performance. Our research investigates how the act of providing feedback can affect user understanding of an intelligent system and its accuracy.
arXiv Detail & Related papers (2020-08-28T16:46:41Z)
Opportunities of a Machine Learning-based Decision Support System for Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient. Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist. We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.