Related papers: Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?

URL: http://arxiv.org/abs/2405.06414v2
Date: Mon, 8 Jul 2024 18:41:10 GMT
Title: Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions?
Authors: Hunter McNichols, Jaewook Lee, Stephen Fancsali, Steve Ritter, Andrew Lan,
Abstract summary: We study the capabilities of large language models to generate feedback for open-ended math questions. We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors.
Score: 3.7399138244928145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Intelligent Tutoring Systems (ITSs) often contain an automated feedback component, which provides a predefined feedback message to students when they detect a predefined error. To such a feedback component, we often resort to template-based approaches. These approaches require significant effort from human experts to detect a limited number of possible student errors and provide corresponding feedback. This limitation is exemplified in open-ended math questions, where there can be a large number of different incorrect errors. In our work, we examine the capabilities of large language models (LLMs) to generate feedback for open-ended math questions, similar to that of an established ITS that uses a template-based approach. We fine-tune both open-source and proprietary LLMs on real student responses and corresponding ITS-provided feedback. We measure the quality of the generated feedback using text similarity metrics. We find that open-source and proprietary models both show promise in replicating the feedback they see during training, but do not generalize well to previously unseen student errors. These results suggest that despite being able to learn the formatting of feedback, LLMs are not able to fully understand mathematical errors made by students.

Related papers

MathEDU: Towards Adaptive Feedback for Student Mathematical Problem-Solving [3.2962799070467432]
This paper explores the capabilities of large language models (LLMs) to assess students' math problem-solving processes and provide adaptive feedback.<n>We evaluate the model's ability to support personalized learning in two scenarios: one where the model has access to students' prior answer histories, and another simulating a cold-start context.
arXiv Detail & Related papers (2025-05-23T15:59:39Z)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
You're (Not) My Type -- Can LLMs Generate Feedback of Specific Types for Introductory Programming Tasks? [0.4779196219827508]
This paper aims to generate specific types of feedback for programming tasks using Large Language Models (LLMs) We revisit existing feedback to capture the specifics of the generated feedback, such as randomness, uncertainty, and degrees of variation. Results have implications for future feedback research with regard to, for example, feedback effects and learners' informational needs.
arXiv Detail & Related papers (2024-12-04T17:57:39Z)
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors [78.53699244846285]
Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. LLMs struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions.
arXiv Detail & Related papers (2024-07-12T10:11:40Z)
Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models [2.1485350418225244]
Large language model (LLM)-based methods have shown great promise in feedback generation for programming assignments. This paper explores using LLMs to generate a "feedback-ladder", i.e., multiple levels of feedback for the same problem-submission pair. We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers.
arXiv Detail & Related papers (2024-05-01T03:52:39Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Extracting Self-Consistent Causal Insights from Users Feedback with LLMs and In-context Learning [11.609805521822878]
Microsoft Windows Feedback Hub is designed to receive customer feedback on a wide variety of subjects including critical topics such as power and battery. To better understand and triage issues, we leverage Double Machine Learning (DML) to associate users' feedback with telemetry signals. Our approach is able to extract previously known issues, uncover new bugs, and identify sequences of events that lead to a bug.
arXiv Detail & Related papers (2023-12-11T20:12:46Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
A large language model-assisted education tool to provide feedback on open-ended responses [2.624902795082451]
We present a tool that uses large language models (LLMs), guided by instructor-defined criteria, to automate responses to open-ended questions. Our tool delivers rapid personalized feedback, enabling students to quickly test their knowledge and identify areas for improvement.
arXiv Detail & Related papers (2023-07-25T19:49:55Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models [23.25258654890813]
Large language models (LLMs) hold great promise in enhancing programming education by automatically generating feedback for students. We introduce PyFiXV, our technique to generate high-precision feedback powered by Codex.
arXiv Detail & Related papers (2023-01-24T13:00:25Z)
Simulating Bandit Learning from User Feedback for Extractive Question Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.