Related papers: Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

Improving the Validity of Automatically Generated Feedback via Reinforcement Learning

URL: http://arxiv.org/abs/2403.01304v1
Date: Sat, 2 Mar 2024 20:25:50 GMT
Title: Improving the Validity of Automatically Generated Feedback via Reinforcement Learning
Authors: Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan
Abstract summary: We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
Score: 50.067342343957876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatically generating feedback via large language models (LLMs) in intelligent tutoring systems and online learning platforms has the potential to improve the learning outcomes of many students. However, both feedback generation and evaluation are challenging: feedback content has to be valid especially in subjects like math, which requires models to understand the problem, the solution, and where the student's error lies. Feedback also has to be pedagogically valid to reflect effective tutoring strategies, such as explaining possible misconceptions and encouraging the student, among other desirable features. In this work, we address both problems of automatically generating and evaluating feedback while considering both correctness and alignment. First, we propose a rubric for evaluating math feedback and show that GPT-4 is able to effectively use it to annotate human-written and LLM-generated feedback. Second, we propose a framework for feedback generation that optimizes both correctness and alignment using reinforcement learning (RL). Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO). We show that our methods significantly increase the correctness and alignment of generated feedback with Llama 2, an open-source LLM, qualitatively analyze our generation and evaluation systems using case studies, and outline several areas for future work.

Related papers

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning [76.09281171131941]
Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy.<n>We propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors.
arXiv Detail & Related papers (2025-05-21T15:00:07Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses [0.0]
This study aims to explore the potential of Large Language Models (LLMs) in facilitating automated feedback in math education. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems. We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers.
arXiv Detail & Related papers (2024-10-29T16:57:45Z)
Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions [6.216542656489173]
We propose PROF that PROduces Feedback via learning from LM simulated student revisions. We empirically test the efficacy of PROF and observe that our approach surpasses a variety of baseline methods in effectiveness of improving students' writing.
arXiv Detail & Related papers (2024-10-10T15:52:48Z)
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment. We show that demonstrating its superiority to coarse-grained feedback is not automatic. We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z)
Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises [0.0]
We introduce ECHO, a machine learning method to automate the reuse of feedback in educational code reviews. Based on annotations from both automated linting tools and human reviewers, we show that ECHO can accurately and quickly predict appropriate feedback annotations.
arXiv Detail & Related papers (2024-04-26T14:03:19Z)
Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z)
Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment. We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems. By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
Using Large Language Models to Provide Explanatory Feedback to Human Tutors [3.2507682694499582]
We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise. This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based. More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition.
arXiv Detail & Related papers (2023-06-27T14:19:12Z)
System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process. We conduct two case studies of this approach for improving search query and dialog response generation. We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z)
Deep Discourse Analysis for Generating Personalized Feedback in Intelligent Tutor Systems [4.716555240531893]
We explore creating automated, personalized feedback in an intelligent tutoring system (ITS) Our goal is to pinpoint correct and incorrect concepts in student answers in order to achieve better student learning gains.
arXiv Detail & Related papers (2021-03-13T20:33:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.