A Multicriteria Evaluation for Data-Driven Programming Feedback Systems:
Accuracy, Effectiveness, Fallibility, and Students' Response
- URL: http://arxiv.org/abs/2208.05326v1
- Date: Wed, 27 Jul 2022 00:29:32 GMT
- Title: A Multicriteria Evaluation for Data-Driven Programming Feedback Systems:
Accuracy, Effectiveness, Fallibility, and Students' Response
- Authors: Preya Shabrina, Samiha Marwan, Andrew Bennison, Min Chi, Thomas Price,
Tiffany Barnes
- Abstract summary: Data-driven programming feedback systems can help novices to program in the absence of a human tutor.
Prior evaluations showed that these systems improve learning in terms of test scores, or task completion efficiency.
These aspects include inherent fallibility of current state-of-the-art, students' programming behavior in response to correct/incorrect feedback, and effective/ineffective system components.
- Score: 7.167352606079407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-driven programming feedback systems can help novices to program in the
absence of a human tutor. Prior evaluations showed that these systems improve
learning in terms of test scores, or task completion efficiency. However,
crucial aspects which can impact learning or reveal insights important for
future improvement of such systems are ignored in these evaluations. These
aspects include inherent fallibility of current state-of-the-art, students'
programming behavior in response to correct/incorrect feedback, and
effective/ineffective system components. Consequently, a great deal of
knowledge is yet to be discovered about such systems. In this paper, we apply a
multi-criteria evaluation with 5 criteria on a data-driven feedback system
integrated within a block-based novice programming environment. Each criterion
in the evaluation reveals a unique pivotal aspect of the system: 1) How
accurate the feedback system is; 2) How it guides students throughout
programming tasks; 3) How it helps students in task completion; 4) What happens
when it goes wrong; and 5) How students respond generally to the system. Our
evaluation results showed that the system was helpful to students due to its
effective design and feedback representation despite being fallible. However,
novices can be negatively impacted by this fallibility due to high reliance and
lack of self-evaluation. The negative impacts include increased working time,
implementation, or submission of incorrect/partially correct solutions. The
evaluation results reinforced the necessity of multi-criteria system
evaluations while revealing important insights helpful to ensuring proper usage
of data-driven feedback systems, designing fallibility mitigation steps, and
driving research for future improvement.
Related papers
- Pessimistic Evaluation [58.736490198613154]
We argue that evaluating information access systems assumes utilitarian values not aligned with traditions of information access based on equal access.
We advocate for pessimistic evaluation of information access systems focusing on worst case utility.
arXiv Detail & Related papers (2024-10-17T15:40:09Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Adaptation of the Multi-Concept Multivariate Elo Rating System to Medical Students Training Data [6.222836318380985]
Elo rating system is widely recognized for its proficiency in predicting student performance.
This paper presents an adaptation of a multi concept variant of the Elo rating system to the data collected by a medical training platform.
arXiv Detail & Related papers (2024-02-26T19:19:56Z) - Identifying Student Profiles Within Online Judge Systems Using
Explainable Artificial Intelligence [6.638206014723678]
Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students.
This work aims to tackle this limitation by considering the further exploitation of the information gathered by the OJ and automatically inferring feedback for both the student and the instructor.
arXiv Detail & Related papers (2024-01-29T12:11:30Z) - An integrated framework for developing and evaluating an automated
lecture style assessment system [0.784125444722239]
The proposed application utilizes specific measurable biometric characteristics, such as facial expressions, body activity, speech rate and intonation, hand movement, and facial pose.
Results indicate that participants found the application novel and useful in providing automated feedback regarding lecture quality.
arXiv Detail & Related papers (2023-11-30T21:31:21Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - Automatic Assessment of the Design Quality of Student Python and Java
Programs [0.0]
We propose a rule-based system that assesses student programs for quality of design of and provides personalized, precise feedback on how to improve their work.
The students benefited from the system and the rate of design quality flaws dropped 47.84% on average over 4 different assignments, 2 in Python and 2 in Java, in comparison to the previous 2 to 3 years of student submissions.
arXiv Detail & Related papers (2022-08-22T06:04:10Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - Soliciting Human-in-the-Loop User Feedback for Interactive Machine
Learning Reduces User Trust and Impressions of Model Accuracy [8.11839312231511]
Mixed-initiative systems allow users to interactively provide feedback to improve system performance.
Our research investigates how the act of providing feedback can affect user understanding of an intelligent system and its accuracy.
arXiv Detail & Related papers (2020-08-28T16:46:41Z) - Opportunities of a Machine Learning-based Decision Support System for
Stroke Rehabilitation Assessment [64.52563354823711]
Rehabilitation assessment is critical to determine an adequate intervention for a patient.
Current practices of assessment mainly rely on therapist's experience, and assessment is infrequently executed due to the limited availability of a therapist.
We developed an intelligent decision support system that can identify salient features of assessment using reinforcement learning.
arXiv Detail & Related papers (2020-02-27T17:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.