Grading Massive Open Online Courses Using Large Language Models
- URL: http://arxiv.org/abs/2406.11102v1
- Date: Sun, 16 Jun 2024 23:42:11 GMT
- Title: Grading Massive Open Online Courses Using Large Language Models
- Authors: Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger,
- Abstract summary: Massive open online courses (MOOCs) offer free education globally to anyone with a computer and internet access.
Peer grading, often guided by a straightforward rubric, is the method of choice.
We explore the feasibility of using large language models (LLMs) to replace peer grading in MOOCs.
- Score: 3.0936354370614607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Massive open online courses (MOOCs) offer free education globally to anyone with a computer and internet access. Despite this democratization of learning, the massive enrollment in these courses makes it impractical for one instructor to assess every student's writing assignment. As a result, peer grading, often guided by a straightforward rubric, is the method of choice. While convenient, peer grading often falls short in terms of reliability and validity. In this study, we explore the feasibility of using large language models (LLMs) to replace peer grading in MOOCs. Specifically, we use two LLMs, GPT-4 and GPT-3.5, across three MOOCs: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. To instruct LLMs, we use three different prompts based on the zero-shot chain-of-thought (ZCoT) prompting technique: (1) ZCoT with instructor-provided correct answers, (2) ZCoT with both instructor-provided correct answers and rubrics, and (3) ZCoT with instructor-provided correct answers and LLM-generated rubrics. Tested on 18 settings, our results show that ZCoT, when augmented with instructor-provided correct answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. Finally, our findings indicate a promising potential for automated grading systems in MOOCs, especially in subjects with well-defined rubrics, to improve the learning experience for millions of online learners worldwide.
Related papers
- AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails [43.19453208130667]
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation.
In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems.
We create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer.
arXiv Detail & Related papers (2024-02-14T14:53:56Z) - Large Language Models As MOOCs Graders [3.379574469735166]
We explore the feasibility of leveraging large language models (LLMs) to replace peer grading in MOOCs.
To instruct LLMs, we use three different prompts based on a variant of the zero-shot chain-of-thought prompting technique.
Our results show that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors.
arXiv Detail & Related papers (2024-02-06T07:43:07Z) - The Earth is Flat? Unveiling Factual Errors in Large Language Models [89.94270049334479]
Large Language Models (LLMs) like ChatGPT are in various applications due to their extensive knowledge from pre-training and fine-tuning.
Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education.
We introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs.
arXiv Detail & Related papers (2024-01-01T14:02:27Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Measuring Five Accountable Talk Moves to Improve Instruction at Scale [1.4549461207028445]
We fine-tune models to identify five instructional talk moves inspired by accountable talk theory.
We correlate the instructors' use of each talk move with indicators of student engagement and satisfaction.
These results corroborate previous research on the effectiveness of accountable talk moves.
arXiv Detail & Related papers (2023-11-02T03:04:50Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - How does online teamwork change student communication patterns in
programming courses? [0.0]
Recent studies have shown that peer communication positively affects learning outcomes of online teaching.
In this study, we compare communication patterns in MOOCs where peer communication is limited with those of a blended course in which students are involved in online peer instruction.
arXiv Detail & Related papers (2022-04-08T18:34:52Z) - Large Scale Analysis of Open MOOC Reviews to Support Learners' Course
Selection [17.376856503445826]
We analyze 2.4 million reviews (which is the largest MOOC reviews dataset used until now) from five different platforms.
Results show that numeric ratings are clearly biased (63% of them are 5-star ratings)
We expect our study to shed some light on the area and promote a more transparent approach in online education reviews.
arXiv Detail & Related papers (2022-01-11T10:24:49Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.