Related papers: An Exploration of Higher Education Course Evaluation by Large Language Models

An Exploration of Higher Education Course Evaluation by Large Language Models

URL: http://arxiv.org/abs/2411.02455v1
Date: Sun, 03 Nov 2024 20:43:52 GMT
Title: An Exploration of Higher Education Course Evaluation by Large Language Models
Authors: Bo Yuan, Jiazi Hu,
Abstract summary: Large language models (LLMs) within artificial intelligence (AI) present promising new avenues for enhancing course evaluation processes. This study explores the application of LLMs in automated course evaluation from multiple perspectives and conducts rigorous experiments across 100 courses at a major university in China.
Score: 4.943165921136573
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Course evaluation is a critical component in higher education pedagogy. It not only serves to identify limitations in existing course designs and provide a basis for curricular innovation, but also to offer quantitative insights for university administrative decision-making. Traditional evaluation methods, primarily comprising student surveys, instructor self-assessments, and expert reviews, often encounter challenges, including inherent subjectivity, feedback delays, inefficiencies, and limitations in addressing innovative teaching approaches. Recent advancements in large language models (LLMs) within artificial intelligence (AI) present promising new avenues for enhancing course evaluation processes. This study explores the application of LLMs in automated course evaluation from multiple perspectives and conducts rigorous experiments across 100 courses at a major university in China. The findings indicate that: (1) LLMs can be an effective tool for course evaluation; (2) their effectiveness is contingent upon appropriate fine-tuning and prompt engineering; and (3) LLM-generated evaluation results demonstrate a notable level of rationality and interpretability.

Related papers

Teaching at Scale: Leveraging AI to Evaluate and Elevate Engineering Education [3.557803321422781]
This article presents a scalable, AI-supported framework for qualitative student feedback using large language models.<n>The system employs hierarchical summarization, anonymization, and exception handling to extract actionable themes from open-ended comments.<n>We report on its successful deployment across a large college of engineering.
arXiv Detail & Related papers (2025-08-01T20:27:40Z)
A Practical Guide for Supporting Formative Assessment and Feedback Using Generative AI [0.0]
Large-language models (LLMs) can help students, teachers, and peers understand "where learners are going," "where learners currently are," and "how to move learners forward"<n>This review provides a comprehensive foundation for integrating LLMs into formative assessment in a pedagogically informed manner.
arXiv Detail & Related papers (2025-05-29T12:52:43Z)
PanguIR Technical Report for NTCIR-18 AEOLLM Task [12.061652026366591]
Large language models (LLMs) are increasingly critical and challenging to evaluate. Manual evaluation, while comprehensive, is often costly and resource-intensive. automatic evaluation offers greater scalability but is constrained by the limitations of its evaluation criteria.
arXiv Detail & Related papers (2025-03-04T07:40:02Z)
A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education [0.6141800972050401]
We propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes.
arXiv Detail & Related papers (2025-01-24T08:01:41Z)
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs [97.94579295913606]
Multimodal Large Language Models (MLLMs) have garnered increased attention from both industry and academia. In the development process, evaluation is critical since it provides intuitive feedback and guidance on improving models. This work aims to offer researchers an easy grasp of how to effectively evaluate MLLMs according to different needs and to inspire better evaluation methods.
arXiv Detail & Related papers (2024-11-22T18:59:54Z)
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models [0.0]
We propose a comprehensive approach to benchmark development based on rigorous psychometric principles. We make the first attempt to illustrate this approach by creating a new benchmark in the field of pedagogy and education. We construct a novel benchmark guided by the Bloom's taxonomy and rigorously designed by a consortium of education experts trained in test development.
arXiv Detail & Related papers (2024-10-29T19:32:43Z)
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation [29.81362106367831]
Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases. In contrast to previous studies that rely on human annotations, Auto-PRE selects evaluators automatically based on their inherent traits. Experimental results indicate our Auto-PRE achieves state-of-the-art performance at a lower cost.
arXiv Detail & Related papers (2024-10-16T06:06:06Z)
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark [12.729687989535359]
evaluating Large Language Models (LLMs) in languages other than English is crucial for ensuring their linguistic versatility, cultural relevance, and applicability in diverse global contexts. We tackle this challenge by introducing a structured benchmark using the INVALSI tests, a set of well-established assessments designed to measure educational competencies across Italy.
arXiv Detail & Related papers (2024-06-25T13:20:08Z)
Facilitating Holistic Evaluations with LLMs: Insights from Scenario-Based Experiments [0.22499166814992438]
Even experienced faculty teams find it challenging to realize a holistic evaluation that accommodates diverse perspectives. This paper explores the use of a Large Language Model (LLM) as a facilitator to integrate diverse faculty assessments.
arXiv Detail & Related papers (2024-05-28T01:07:06Z)
FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models [64.11333762954283]
This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities.
arXiv Detail & Related papers (2024-04-29T01:49:07Z)
Evaluating Mathematical Reasoning Beyond Accuracy [50.09931172314218]
We introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps. We show that ReasonEval consistently outperforms baseline methods in the meta-evaluation datasets. We observe that ReasonEval can play a significant role in data selection.
arXiv Detail & Related papers (2024-04-08T17:18:04Z)
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach [64.42462708687921]
Evaluations have revealed that factors such as scaling, training types, architectures and other factors profoundly impact the performance of LLMs. Our study embarks on a thorough re-examination of these LLMs, targeting the inadequacies in current evaluation methods. This includes the application of ANOVA, Tukey HSD tests, GAMM, and clustering technique.
arXiv Detail & Related papers (2024-03-22T14:47:35Z)
Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes. We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function. Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z)
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z)
A Survey on Evaluation of Large Language Models [87.60417393701331]
Large language models (LLMs) are gaining increasing popularity in both academia and industry. This paper focuses on three key dimensions: what to evaluate, where to evaluate, and how to evaluate.
arXiv Detail & Related papers (2023-07-06T16:28:35Z)
Case study of Innovative Teaching Practices and their Impact for Electrical Engineering Courses during COVID-19 Pandemic [3.797359376885945]
The study provides the students feedback on online assessment techniques incorporated with the MPL, due to online teaching during COVID-19 pandemic. It can be concluded that the MPL and online assessment actually help to achieve better attainment of the Student Learning Outcomes, even during a pandemic situation.
arXiv Detail & Related papers (2021-07-01T21:10:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.