Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners
- URL: http://arxiv.org/abs/2408.05204v1
- Date: Fri, 9 Aug 2024 17:53:35 GMT
- Title: Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners
- Authors: Michael Vaccaro Jr, Mikayla Friday, Arash Zaghi,
- Abstract summary: GPT-4 was used to profile student learning preferences based on choices made during a training session.
For the experimental group, GPT-4 was used to rewrite science texts to align with the student's predicted profile while, for students in the control group, texts were rewritten to contradict their learning preferences.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs), including OpenAI's GPT-series, have made significant advancements in recent years. Known for their expertise across diverse subject areas and quick adaptability to user-provided prompts, LLMs hold unique potential as Personalized Learning (PL) tools. Despite this potential, their application in K-12 education remains largely unexplored. This paper presents one of the first randomized controlled trials (n = 23) to evaluate the effectiveness of GPT-4 in personalizing educational science texts for middle school students. In this study, GPT-4 was used to profile student learning preferences based on choices made during a training session. For the experimental group, GPT-4 was used to rewrite science texts to align with the student's predicted profile while, for students in the control group, texts were rewritten to contradict their learning preferences. The results of a Mann-Whitney U test showed that students significantly preferred (at the .10 level) the rewritten texts when they were aligned with their profile (p = .059). These findings suggest that GPT-4 can effectively interpret and tailor educational content to diverse learner preferences, marking a significant advancement in PL technology. The limitations of this study and ethical considerations for using artificial intelligence in education are also discussed.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - Educational Personalized Learning Path Planning with Large Language Models [0.0]
This paper proposes a novel approach integrating Large Language Models (LLMs) with prompt engineering to address these challenges.
By designing prompts that incorporate learner-specific information, our method guides LLMs like LLama-2-70B and GPT-4 to generate personalized, coherent, and pedagogically sound learning paths.
arXiv Detail & Related papers (2024-07-16T14:32:56Z) - Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions [2.0411082897313984]
This study investigates how LLMs, specifically GPT-3.5 and GPT-4, can develop tailored questions for Grade 9 math.
By utilizing an iterative method, these models adjust questions based on difficulty and content, responding to feedback from a simulated'student' model.
arXiv Detail & Related papers (2024-06-20T00:25:43Z) - Can Large Language Models Make the Grade? An Empirical Study Evaluating LLMs Ability to Mark Short Answer Questions in K-12 Education [0.0]
This paper presents reports on a series of experiments with a novel dataset evaluating how well Large Language Models can mark (i.e. grade) open text responses to short answer questions.
We found that GPT-4, with basic few-shot prompting performed well (Kappa, 0.70) and, importantly, very close to human-level performance (0.75)
This research builds on prior findings that GPT-4 could reliably score short answer reading comprehension questions at a performance-level very close to that of expert human raters.
arXiv Detail & Related papers (2024-05-05T16:11:06Z) - Evaluating and Optimizing Educational Content with Large Language Model Judgments [52.33701672559594]
We use Language Models (LMs) as educational experts to assess the impact of various instructions on learning outcomes.
We introduce an instruction optimization approach in which one LM generates instructional materials using the judgments of another LM as a reward function.
Human teachers' evaluations of these LM-generated worksheets show a significant alignment between the LM judgments and human teacher preferences.
arXiv Detail & Related papers (2024-03-05T09:09:15Z) - Predicting Learning Performance with Large Language Models: A Study in Adult Literacy [18.48602704139462]
This study investigates the application of advanced AI models, including Large Language Models (LLMs), for predicting learning performance in adult literacy programs in ITSs.
We evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques.
arXiv Detail & Related papers (2024-03-04T08:14:07Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts [21.150221839202878]
Large Language Models (LLMs) have achieved significant success across various general tasks.
In this work, we examine the proficiency of LLMs in generating succinct survey articles specific to the niche field of NLP in computer science.
We compare both human and GPT-based evaluation scores and provide in-depth analysis.
arXiv Detail & Related papers (2023-08-21T01:32:45Z) - Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.