Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation
- URL: http://arxiv.org/abs/2404.15845v1
- Date: Wed, 24 Apr 2024 12:48:06 GMT
- Title: Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation
- Authors: Maja Stahl, Leon Biermann, Andreas Nehring, Henning Wachsmuth,
- Abstract summary: Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text.
This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback.
Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback.
- Score: 13.854903594424876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.
Related papers
- SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems [5.191286314473505]
Synthetic Educational Feedback Loops (SEFL) is a novel framework designed to deliver immediate, on-demand feedback at scale.
Two large language models (LLMs) operate in teacher--student roles to simulate assignment completion and formative feedback.
We show that SEFL-tuned models outperform their non-tuned counterparts in feedback quality, clarity, and timeliness.
arXiv Detail & Related papers (2025-02-18T15:09:29Z) - eRevise+RF: A Writing Evaluation System for Assessing Student Essay Revisions and Providing Formative Feedback [1.5367711550341163]
eRevise+RF is an enhanced AWE system for assessing student essay revisions and providing revision feedback.
We deployed the system with 6 teachers and 406 students across 3 schools in Pennsylvania and Louisiana.
The results confirmed its effectiveness in (1) assessing student essays in terms of evidence usage, (2) extracting evidence and reasoning revisions across essays, and (3) determining revision success in responding to feedback.
arXiv Detail & Related papers (2025-01-01T03:49:48Z) - Streamlining the review process: AI-generated annotations in research manuscripts [0.5735035463793009]
This study explores the potential of integrating Large Language Models (LLMs) into the peer-review process to enhance efficiency without compromising effectiveness.
We focus on manuscript annotations, particularly excerpt highlights, as a potential area for AI-human collaboration.
This paper introduces AnnotateGPT, a platform that utilizes GPT-4 for manuscript review, aiming to improve reviewers' comprehension and focus.
arXiv Detail & Related papers (2024-11-29T23:26:34Z) - Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions [6.216542656489173]
We propose PROF that PROduces Feedback via learning from LM simulated student revisions.
We empirically test the efficacy of PROF and observe that our approach surpasses a variety of baseline methods in effectiveness of improving students' writing.
arXiv Detail & Related papers (2024-10-10T15:52:48Z) - "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays [6.810086342993699]
This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback.
CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation.
A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback.
arXiv Detail & Related papers (2024-09-11T17:59:01Z) - Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course [49.296957552006226]
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research.
This report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students.
arXiv Detail & Related papers (2024-07-07T00:17:24Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [46.667783153759636]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL)
Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - Can Large Language Models Be an Alternative to Human Evaluations? [80.81532239566992]
Large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided.
We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation.
arXiv Detail & Related papers (2023-05-03T07:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.