Related papers: Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation

URL: http://arxiv.org/abs/2404.15845v1
Date: Wed, 24 Apr 2024 12:48:06 GMT
Title: Exploring LLM Prompting Strategies for Joint Essay Scoring and Feedback Generation
Authors: Maja Stahl, Leon Biermann, Andreas Nehring, Henning Wachsmuth,
Abstract summary: Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback.
Score: 13.854903594424876
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Individual feedback can help students improve their essay writing skills. However, the manual effort required to provide such feedback limits individualization in practice. Automatically-generated essay feedback may serve as an alternative to guide students at their own pace, convenience, and desired frequency. Large language models (LLMs) have demonstrated strong performance in generating coherent and contextually relevant text. Yet, their ability to provide helpful essay feedback is unclear. This work explores several prompting strategies for LLM-based zero-shot and few-shot generation of essay feedback. Inspired by Chain-of-Thought prompting, we study how and to what extent automated essay scoring (AES) can benefit the quality of generated feedback. We evaluate both the AES performance that LLMs can achieve with prompting only and the helpfulness of the generated essay feedback. Our results suggest that tackling AES and feedback generation jointly improves AES performance. However, while our manual evaluation emphasizes the quality of the generated essay feedback, the impact of essay scoring on the generated feedback remains low ultimately.

Related papers

Understanding and Supporting Peer Review Using AI-reframed Positive Summary [18.686807993563168]
This study explored the impact of appending an automatically generated positive summary to the peer reviews of a writing task. We found that adding an AI-reframed positive summary to otherwise harsh feedback increased authors' critique acceptance. We discuss the implications of using AI in peer feedback, focusing on how it can influence critique acceptance and support research communities.
arXiv Detail & Related papers (2025-03-13T11:22:12Z)
SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems [5.191286314473505]
Synthetic Educational Feedback Loops (SEFL) is a novel framework designed to deliver immediate, on-demand feedback at scale. Two large language models (LLMs) operate in teacher--student roles to simulate assignment completion and formative feedback. We show that SEFL-tuned models outperform their non-tuned counterparts in feedback quality, clarity, and timeliness.
arXiv Detail & Related papers (2025-02-18T15:09:29Z)
eRevise+RF: A Writing Evaluation System for Assessing Student Essay Revisions and Providing Formative Feedback [1.5367711550341163]
eRevise+RF is an enhanced AWE system for assessing student essay revisions and providing revision feedback. We deployed the system with 6 teachers and 406 students across 3 schools in Pennsylvania and Louisiana. The results confirmed its effectiveness in (1) assessing student essays in terms of evidence usage, (2) extracting evidence and reasoning revisions across essays, and (3) determining revision success in responding to feedback.
arXiv Detail & Related papers (2025-01-01T03:49:48Z)
Streamlining the review process: AI-generated annotations in research manuscripts [0.5735035463793009]
This study explores the potential of integrating Large Language Models (LLMs) into the peer-review process to enhance efficiency without compromising effectiveness. We focus on manuscript annotations, particularly excerpt highlights, as a potential area for AI-human collaboration. This paper introduces AnnotateGPT, a platform that utilizes GPT-4 for manuscript review, aiming to improve reviewers' comprehension and focus.
arXiv Detail & Related papers (2024-11-29T23:26:34Z)
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation [29.81362106367831]
Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases. In contrast to previous studies that rely on human annotations, Auto-PRE selects evaluators automatically based on their inherent traits. Experimental results indicate our Auto-PRE achieves state-of-the-art performance at a lower cost.
arXiv Detail & Related papers (2024-10-16T06:06:06Z)
Closing the Loop: Learning to Generate Writing Feedback via Language Model Simulated Student Revisions [6.216542656489173]
We propose PROF that PROduces Feedback via learning from LM simulated student revisions. We empirically test the efficacy of PROF and observe that our approach surpasses a variety of baseline methods in effectiveness of improving students' writing.
arXiv Detail & Related papers (2024-10-10T15:52:48Z)
"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays [6.810086342993699]
This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback.
arXiv Detail & Related papers (2024-09-11T17:59:01Z)
Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education [1.6340559025561785]
Large language model (LLM)-based evaluation pipelines have demonstrated their capability to robustly evaluate machine-generated text. We investigate whether LLMs can effectively assess human-written text for educational purposes.
arXiv Detail & Related papers (2024-07-24T06:02:57Z)
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course [49.296957552006226]
Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. This report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students.
arXiv Detail & Related papers (2024-07-07T00:17:24Z)
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks. This study focuses on the topic of LLMs assist NLP Researchers. To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z)
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning [50.067342343957876]
We propose a framework for feedback generation that optimize both correctness and alignment using reinforcement learning (RL) Specifically, we use GPT-4's annotations to create preferences over feedback pairs in an augmented dataset for training via direct preference optimization (DPO)
arXiv Detail & Related papers (2024-03-02T20:25:50Z)
Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment. We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems. By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z)
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z)
Can Large Language Models Be an Alternative to Human Evaluations? [80.81532239566992]
Large language models (LLMs) have demonstrated exceptional performance on unseen tasks when only the task instructions are provided. We show that the result of LLM evaluation is consistent with the results obtained by expert human evaluation.
arXiv Detail & Related papers (2023-05-03T07:28:50Z)
Annotation and Classification of Evidence and Reasoning Revisions in Argumentative Writing [0.9449650062296824]
We introduce an annotation scheme to capture the nature of sentence-level revisions of evidence use and reasoning. We show that reliable manual annotation can be achieved and that revision annotations correlate with a holistic assessment of essay improvement.
arXiv Detail & Related papers (2021-07-14T20:58:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.