Large Language Models As MOOCs Graders
- URL: http://arxiv.org/abs/2402.03776v4
- Date: Fri, 1 Mar 2024 04:48:41 GMT
- Title: Large Language Models As MOOCs Graders
- Authors: Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger
- Abstract summary: We explore the feasibility of leveraging large language models (LLMs) to replace peer grading in MOOCs.
To instruct LLMs, we use three different prompts based on a variant of the zero-shot chain-of-thought prompting technique.
Our results show that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors.
- Score: 3.379574469735166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Massive open online courses (MOOCs) unlock the doors to free education for
anyone around the globe with access to a computer and the internet. Despite
this democratization of learning, the massive enrollment in these courses means
it is almost impossible for one instructor to assess every student's writing
assignment. As a result, peer grading, often guided by a straightforward
rubric, is the method of choice. While convenient, peer grading often falls
short in terms of reliability and validity. In this study, using 18 distinct
settings, we explore the feasibility of leveraging large language models (LLMs)
to replace peer grading in MOOCs. Specifically, we focus on two
state-of-the-art LLMs: GPT-4 and GPT-3.5, across three distinct courses:
Introductory Astronomy, Astrobiology, and the History and Philosophy of
Astronomy. To instruct LLMs, we use three different prompts based on a variant
of the zero-shot chain-of-thought (Zero-shot-CoT) prompting technique:
Zero-shot-CoT combined with instructor-provided correct answers; Zero-shot-CoT
in conjunction with both instructor-formulated answers and rubrics; and
Zero-shot-CoT with instructor-offered correct answers and LLM-generated
rubrics. Our results show that Zero-shot-CoT, when integrated with
instructor-provided answers and rubrics, produces grades that are more aligned
with those assigned by instructors compared to peer grading. However, the
History and Philosophy of Astronomy course proves to be more challenging in
terms of grading as opposed to other courses. Finally, our study reveals a
promising direction for automating grading systems for MOOCs, especially in
subjects with well-defined rubrics.
Related papers
- Grading Massive Open Online Courses Using Large Language Models [3.0936354370614607]
Massive open online courses (MOOCs) offer free education globally to anyone with a computer and internet access.
Peer grading, often guided by a straightforward rubric, is the method of choice.
We explore the feasibility of using large language models (LLMs) to replace peer grading in MOOCs.
arXiv Detail & Related papers (2024-06-16T23:42:11Z) - Grade Like a Human: Rethinking Automated Assessment with Large Language Models [11.442433408767583]
Large language models (LLMs) have been used for automated grading, but they have not yet achieved the same level of performance as humans.
We propose an LLM-based grading system that addresses the entire grading procedure, including the following key components.
arXiv Detail & Related papers (2024-05-30T05:08:15Z) - GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers [68.77382332826167]
Large language models (LLMs) have achieved impressive performance across various mathematical reasoning benchmarks.
One essential and frequently occurring evidence is that when the math questions are slightly changed, LLMs can behave incorrectly.
This motivates us to evaluate the robustness of LLMs' math reasoning capability by testing a wide range of question variations.
arXiv Detail & Related papers (2024-02-29T15:26:14Z) - AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails [43.19453208130667]
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation.
In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems.
We create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer.
arXiv Detail & Related papers (2024-02-14T14:53:56Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Democratizing Reasoning Ability: Tailored Learning from Large Language
Model [97.4921006089966]
We propose a tailored learning approach to distill such reasoning ability to smaller LMs.
We exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm.
To exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes.
arXiv Detail & Related papers (2023-10-20T07:50:10Z) - CITING: Large Language Models Create Curriculum for Instruction Tuning [35.66902011221179]
We exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs.
Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors.
arXiv Detail & Related papers (2023-10-04T01:58:34Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z) - Self-Prompting Large Language Models for Zero-Shot Open-Domain QA [67.08732962244301]
Open-Domain Question Answering (ODQA) aims to answer questions without explicitly providing background documents.
This task becomes notably challenging in a zero-shot setting where no data is available to train tailored retrieval-reader models.
We propose a Self-Prompting framework to explicitly utilize the massive knowledge encoded in the parameters of Large Language Models.
arXiv Detail & Related papers (2022-12-16T18:23:43Z) - Large Scale Analysis of Open MOOC Reviews to Support Learners' Course
Selection [17.376856503445826]
We analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms.
Results show that numeric ratings are clearly biased (63% of them are 5-star ratings)
We expect our study to shed some light on the area and promote a more transparent approach in online education reviews.
arXiv Detail & Related papers (2022-01-11T10:24:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.