"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays
- URL: http://arxiv.org/abs/2409.07453v1
- Date: Wed, 11 Sep 2024 17:59:01 GMT
- Title: "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays
- Authors: Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan,
- Abstract summary: This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback.
CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation.
A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback.
- Score: 6.810086342993699
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.
Related papers
- Joint Learning of Context and Feedback Embeddings in Spoken Dialogue [3.8673630752805446]
We investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective.
Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.
arXiv Detail & Related papers (2024-06-11T14:22:37Z) - Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments.
This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair.
We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z) - Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning [3.2721068185888127]
We present a proof-of-concept application to offer students dynamic and contextualized feedback.
Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT.
We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context.
arXiv Detail & Related papers (2024-04-28T17:56:14Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course.
We introduce UKP-SQuARE as a platform for QA education.
Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z) - Rethinking the Evaluation for Conversational Recommendation in the Era
of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs)
In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol.
We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - Facial Feedback for Reinforcement Learning: A Case Study and Offline
Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback.
With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback.
Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.