Related papers: "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

URL: http://arxiv.org/abs/2409.07453v1
Date: Wed, 11 Sep 2024 17:59:01 GMT
Title: "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays
Authors: Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan,
Abstract summary: This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback.
Score: 6.810086342993699
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.

Related papers

User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal [58.43749783815486]
We study implicit user feedback in two user-LM interaction datasets.<n>We find that the contents of user feedback can improve model performance in short human-designed questions.<n>We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt.
arXiv Detail & Related papers (2025-07-30T23:33:29Z)
Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models [54.85405423240165]
We introduce Interactive Reasoning, an interaction design that visualizes chain-of-thought outputs as a hierarchy of topics.<n>We implement interactive reasoning in Hippo, a prototype for AI-assisted decision making in the face of uncertain trade-offs.
arXiv Detail & Related papers (2025-06-30T10:00:43Z)
Exploring LLM-Generated Feedback for Economics Essays: How Teaching Assistants Evaluate and Envision Its Use [3.345149032274467]
This project examines the prospect of using AI-generated feedback as suggestions to expedite and enhance human instructors' feedback provision.<n>We developed a feedback engine that generates feedback on students' essays based on grading rubrics used by the teaching assistants (TAs)<n>We performed think-aloud studies with 5 TAs over 20 1-hour sessions to have them evaluate the AI feedback, contrast the AI feedback with their handwritten feedback, and share how they envision using the AI feedback if they were offered as suggestions.
arXiv Detail & Related papers (2025-05-21T14:50:30Z)
Alignment Drift in CEFR-prompted LLMs for Interactive Spanish Tutoring [0.0]
This paper investigates the potentials of Large Language Models (LLMs) as adaptive tutors in the context of second-language learning.<n>We simulate full teacher-student dialogues in Spanish using instruction-tuned, open-source LLMs ranging in size from 7B to 12B parameters.<n>The output from the tutor model is then used to evaluate the effectiveness of CEFR-based prompting to control text difficulty across three proficiency levels.
arXiv Detail & Related papers (2025-05-13T08:50:57Z)
Playpen: An Environment for Exploring Learning Through Conversational Interaction [81.67330926729015]
We investigate whether Dialogue Games can also serve as a source of feedback signals for learning.<n>We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play.<n>We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills.
arXiv Detail & Related papers (2025-04-11T14:49:33Z)
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models [38.1620443730172]
Self-Correction based on feedback improves the output quality of Large Language Models (LLMs) In this study, we demonstrate that clarifying intentions is essential for effectively reducing biases in LLMs through Self-Correction.
arXiv Detail & Related papers (2025-03-08T02:20:43Z)
SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems [5.191286314473505]
Synthetic Educational Feedback Loops (SEFL) is a novel framework designed to deliver immediate, on-demand feedback at scale. Two large language models (LLMs) operate in teacher--student roles to simulate assignment completion and formative feedback. We show that SEFL-tuned models outperform their non-tuned counterparts in feedback quality, clarity, and timeliness.
arXiv Detail & Related papers (2025-02-18T15:09:29Z)
Oversight in Action: Experiences with Instructor-Moderated LLM Responses in an Online Discussion Forum [2.86800540498016]
This paper presents the design, deployment, and evaluation of a bot' module that is controlled by the instructor. The bot generates draft responses to student questions, which are reviewed, modified, and approved before release. We report our experiences using this tool in a 12-week second-year software engineering course on object-oriented programming.
arXiv Detail & Related papers (2024-12-12T08:17:33Z)
Joint Learning of Context and Feedback Embeddings in Spoken Dialogue [3.8673630752805446]
We investigate the possibility of embedding short dialogue contexts and feedback responses in the same representation space using a contrastive learning objective. Our results show that the model outperforms humans given the same ranking task and that the learned embeddings carry information about the conversational function of feedback responses.
arXiv Detail & Related papers (2024-06-11T14:22:37Z)
Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments. This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair. We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z)
Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning [3.2721068185888127]
We present a proof-of-concept application to offer students dynamic and contextualized feedback. Specifically, we augment an Online Programming Exercise bot for a college-level Cloud Computing course with ChatGPT. We demonstrate that LLMs can be used to generate highly situated reflection triggers that incorporate details of the collaborative discussion happening in context.
arXiv Detail & Related papers (2024-04-28T17:56:14Z)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. We introduce UKP-SQuARE as a platform for QA education. Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework [51.237191651923666]
We investigate the potential of agent learning from trainers' facial expressions via interpreting them as evaluative feedback. With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback. Our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible.
arXiv Detail & Related papers (2020-01-23T17:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.