Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning
- URL: http://arxiv.org/abs/2403.01304v1
- Date: Sat, 2 Mar 2024 20:25:50 GMT
- Title: Improving the Validity of Automatically Generated Feedback via
Reinforcement Learning
- Authors: Alexander Scarlatos, Digory Smith, Simon Woodhead, Andrew Lan
- Abstract summary: We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
- Score: 50.067342343957876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically generating feedback via large language models (LLMs) in
intelligent tutoring systems and online learning platforms has the potential to
improve the learning outcomes of many students. However, both feedback
generation and evaluation are challenging: feedback content has to be valid
especially in subjects like math, which requires models to understand the
problem, the solution, and where the student's error lies. Feedback also has to
be pedagogically valid to reflect effective tutoring strategies, such as
explaining possible misconceptions and encouraging the student, among other
desirable features. In this work, we address both problems of automatically
generating and evaluating feedback while considering both correctness and
alignment. First, we propose a rubric for evaluating math feedback and show
that GPT-4 is able to effectively use it to annotate human-written and
LLM-generated feedback. Second, we propose a framework for feedback generation
that optimizes both correctness and alignment using reinforcement learning
(RL). Specifically, we use GPT-4's annotations to create preferences over
feedback pairs in an augmented dataset for training via direct preference
optimization (DPO). We show that our methods significantly increase the
correctness and alignment of generated feedback with Llama 2, an open-source
LLM, qualitatively analyze our generation and evaluation systems using case
studies, and outline several areas for future work.
Related papers
- Automated Feedback in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses [0.0]
This study aims to explore the potential of Large Language Models (LLMs) in facilitating automated feedback in math education.
We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-written feedback for middle-school math problems.
We evaluate the model's performance in scoring accuracy and the quality of feedback by utilizing judgments from 2 teachers.
arXiv Detail & Related papers (2024-10-29T16:57:45Z) - Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions [6.216542656489173]
We propose PROF that PROduces Feedback via learning from LM simulated student revisions.
We empirically test the efficacy of PROF and observe that our approach surpasses a variety of baseline methods in effectiveness of improving students' writing.
arXiv Detail & Related papers (2024-10-10T15:52:48Z) - Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment.
We show that demonstrating its superiority to coarse-grained feedback is not automatic.
We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z) - Mining patterns in syntax trees to automate code reviews of student solutions for programming exercises [0.0]
We introduce ECHO, a machine learning method to automate the reuse of feedback in educational code reviews.
Based on annotations from both automated linting tools and human reviewers, we show that ECHO can accurately and quickly predict appropriate feedback annotations.
arXiv Detail & Related papers (2024-04-26T14:03:19Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Using Large Language Models to Provide Explanatory Feedback to Human
Tutors [3.2507682694499582]
We present two approaches for supplying tutors real-time feedback within an online lesson on how to give students effective praise.
This work-in-progress demonstrates considerable accuracy in binary classification for corrective feedback of effective, or effort-based.
More notably, we introduce progress towards an enhanced approach of providing explanatory feedback using large language model-facilitated named entity recognition.
arXiv Detail & Related papers (2023-06-27T14:19:12Z) - System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process.
We conduct two case studies of this approach for improving search query and dialog response generation.
We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z) - Deep Discourse Analysis for Generating Personalized Feedback in
Intelligent Tutor Systems [4.716555240531893]
We explore creating automated, personalized feedback in an intelligent tutoring system (ITS)
Our goal is to pinpoint correct and incorrect concepts in student answers in order to achieve better student learning gains.
arXiv Detail & Related papers (2021-03-13T20:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.