Related papers: Evaluation of Large Language Models' educational feedback in Higher Education: potential, limitations and implications for educational practice

Evaluation of Large Language Models' educational feedback in Higher Education: potential, limitations and implications for educational practice

URL: http://arxiv.org/abs/2602.02519v1
Date: Sat, 24 Jan 2026 14:30:25 GMT
Title: Evaluation of Large Language Models' educational feedback in Higher Education: potential, limitations and implications for educational practice
Authors: Daniele Agostini, Federica Picasso,
Abstract summary: This study examines how AI-generated feedback supports student learning using a well-established analytical framework.<n>The evaluation process involved providing seven Large Language Models with a structured rubric, which defined specific criteria and performance levels.<n>Overall, these findings indicate that LLMs can generate well-structured feedback and hold great potential as a sustainable and meaningful feedback tool.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The importance of managing feedback practices in higher education has been widely recognised, as they play a crucial role in enhancing teaching, learning, and assessment processes. In today's educational landscape, feedback practices are increasingly influenced by technological advancements, particularly artificial intelligence (AI). Understanding the impact of AI on feedback generation is essential for identifying its potential benefits and establishing effective implementation strategies. This study examines how AI-generated feedback supports student learning using a well-established analytical framework. Specifically, feedback produced by different Large Language Models (LLMs) was assessed in relation to student-designed projects within a training course on inclusive teaching and learning. The evaluation process involved providing seven LLMs with a structured rubric, developed by the university instructor, which defined specific criteria and performance levels. The LLMs were tasked with generating both quantitative assessments and qualitative feedback based on this rubric. The AI-generated feedback was then analysed using Hughes, Smith, and Creese's framework to evaluate its structure and effectiveness in fostering formative learning experiences. Overall, these findings indicate that LLMs can generate well-structured feedback and hold great potential as a sustainable and meaningful feedback tool, provided they are guided by clear contextual information and a well-defined instructions that will be explored further in the conclusions.

Related papers

Owlgorithm: Supporting Self-Regulated Learning in Competitive Programming through LLM-Driven Reflection [0.0]
We present an educational platform that supports Self-Regulated Learning (SRL) in competitive programming (CP)<n>Owlgorithm produces context-aware, meta prompts tailored to individual student submissions.<n>Our exploratory assessment of student ratings and TA feedback revealed both promising benefits and notable limitations.
arXiv Detail & Related papers (2025-11-13T05:08:45Z)
Teaching at Scale: Leveraging AI to Evaluate and Elevate Engineering Education [3.557803321422781]
This article presents a scalable, AI-supported framework for qualitative student feedback using large language models.<n>The system employs hierarchical summarization, anonymization, and exception handling to extract actionable themes from open-ended comments.<n>We report on its successful deployment across a large college of engineering.
arXiv Detail & Related papers (2025-08-01T20:27:40Z)
Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges [47.14342587731284]
This survey provides a comprehensive overview of alignment techniques, training protocols, and empirical findings in large language models (LLMs) alignment.<n>We analyze the development of alignment methods across diverse paradigms, characterizing the fundamental trade-offs between core alignment objectives.<n>We discuss state-of-the-art techniques, including Direct Preference Optimization (DPO), Constitutional AI, brain-inspired methods, and alignment uncertainty quantification (AUQ)
arXiv Detail & Related papers (2025-07-25T20:52:58Z)
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study [45.82081693725339]
Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning.<n>Yet their learning ability, which is crucial for adapting to dynamic environments and acquiring new knowledge, remains underexplored.
arXiv Detail & Related papers (2025-06-16T13:24:50Z)
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness [46.653774740885275]
Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs)<n>We propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge.<n>Our framework provides a more realistic and rigorous assessment of unlearning performance.
arXiv Detail & Related papers (2025-06-06T04:35:19Z)
A Practical Guide for Supporting Formative Assessment and Feedback Using Generative AI [0.0]
Large-language models (LLMs) can help students, teachers, and peers understand "where learners are going," "where learners currently are," and "how to move learners forward"<n>This review provides a comprehensive foundation for integrating LLMs into formative assessment in a pedagogically informed manner.
arXiv Detail & Related papers (2025-05-29T12:52:43Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [92.99416966226724]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.<n>We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.<n>Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
An Exploration of Higher Education Course Evaluation by Large Language Models [4.943165921136573]
Large language models (LLMs) within artificial intelligence (AI) present promising new avenues for enhancing course evaluation processes. This study explores the application of LLMs in automated course evaluation from multiple perspectives and conducts rigorous experiments across 100 courses at a major university in China.
arXiv Detail & Related papers (2024-11-03T20:43:52Z)
Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences [0.0]
This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS. The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI.
arXiv Detail & Related papers (2024-05-07T20:09:18Z)
Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach [50.125704610228254]
Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence. Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains. We conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom taxonomy.
arXiv Detail & Related papers (2023-10-12T09:55:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.