Related papers: Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions

Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions

URL: http://arxiv.org/abs/2406.13903v1
Date: Thu, 20 Jun 2024 00:25:43 GMT
Title: Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions
Authors: Hamdireza Rouzegar, Masoud Makrehchi,
Abstract summary: This study investigates how LLMs, specifically GPT-3.5 and GPT-4, can develop tailored questions for Grade 9 math. By utilizing an iterative method, these models adjust questions based on difficulty and content, responding to feedback from a simulated'student' model.
Score: 2.0411082897313984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates how LLMs, specifically GPT-3.5 and GPT-4, can develop tailored questions for Grade 9 math, aligning with active learning principles. By utilizing an iterative method, these models adjust questions based on difficulty and content, responding to feedback from a simulated 'student' model. A novel aspect of the research involved using GPT-4 as a 'teacher' to create complex questions, with GPT-3.5 as the 'student' responding to these challenges. This setup mirrors active learning, promoting deeper engagement. The findings demonstrate GPT-4's superior ability to generate precise, challenging questions and notable improvements in GPT-3.5's ability to handle more complex problems after receiving instruction from GPT-4. These results underscore the potential of LLMs to mimic and enhance active learning scenarios, offering a promising path for AI in customized education. This research contributes to understanding how AI can support personalized learning experiences, highlighting the need for further exploration in various educational contexts

Related papers

Use Me Wisely: AI-Driven Assessment for LLM Prompting Skills Development [5.559706293891474]
Large language model (LLM)-powered chatbots have become popular across various domains, supporting a range of tasks and processes. Yet, prompting is highly task- and domain-dependent, limiting the effectiveness of generic approaches. In this study, we explore whether LLM-based methods can facilitate learning assessments by using ad-hoc guidelines and a minimal number of annotated prompt samples.
arXiv Detail & Related papers (2025-03-04T11:56:33Z)
Prompt-Based Cost-Effective Evaluation and Operation of ChatGPT as a Computer Programming Teaching Assistant [0.0]
This article focuses on studying three aspects related to such an application. The performance of two well-known models, GPT-3.5T and GPT-4T, in providing feedback to students is evaluated.
arXiv Detail & Related papers (2025-01-24T08:15:05Z)
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z)
Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology [17.91379291654773]
This work explores using Large Language Models (LLMs) as teachers to generate effective hints for students simulated through LLMs. The results show that model errors increase with higher temperature settings. Interestingly, Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o.
arXiv Detail & Related papers (2024-11-05T20:18:53Z)
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning [78.42927884000673]
ExACT is an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms.
arXiv Detail & Related papers (2024-10-02T21:42:35Z)
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses [51.975495361024606]
We propose a Self-Challenge evaluation framework with human-in-the-loop. Starting from seed instances that GPT-4 fails to answer, we prompt GPT-4 to summarize error patterns that can be used to generate new instances. We then build a benchmark, SC-G4, consisting of 1,835 instances generated by GPT-4 using these patterns, with human-annotated gold responses.
arXiv Detail & Related papers (2024-08-16T19:01:52Z)
LLMs Still Can't Avoid Instanceof: An Investigation Into GPT-3.5, GPT-4 and Bard's Capacity to Handle Object-Oriented Programming Assignments [0.0]
Large Language Models (LLMs) have emerged as promising tools to assist students while solving programming assignments. In this study, we experimented with three prominent LLMs to solve real-world OOP exercises used in educational settings. The findings revealed that while the models frequently achieved mostly working solutions to the exercises, they often overlooked the best practices of OOP.
arXiv Detail & Related papers (2024-03-10T16:40:05Z)
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies [47.129504708849446]
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing. LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available.
arXiv Detail & Related papers (2024-02-27T10:44:52Z)
Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education [0.13654846342364302]
This study introduces the first benchmark to assess the performance of seven major Large Language Models (LLMs) Our analysis shows that most LLMs outperform human candidates, with GPT-4 Turbo not only outperforming the other models but also surpassing the average scores of graduate students at top business schools. While AI's promise in education, assessment, and tutoring is clear, challenges remain.
arXiv Detail & Related papers (2024-01-02T03:54:50Z)
Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models in Automated Software Engineering Tasks [8.223311621898983]
GPT-4 with conversational prompts showed drastic improvement compared to GPT-4 with automatic prompting strategies. fully automated prompt engineering with no human in the loop requires more study and improvement.
arXiv Detail & Related papers (2023-10-11T00:21:00Z)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs. GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system. GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z)
Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data. We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more. We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.