Evaluating Large Language Models on the GMAT: Implications for the
Future of Business Education
- URL: http://arxiv.org/abs/2401.02985v1
- Date: Tue, 2 Jan 2024 03:54:50 GMT
- Title: Evaluating Large Language Models on the GMAT: Implications for the
Future of Business Education
- Authors: Vahid Ashrafimoghari, Necdet G\"urkan, and Jordan W. Suchow
- Abstract summary: This study introduces the first benchmark to assess the performance of seven major Large Language Models (LLMs)
Our analysis shows that most LLMs outperform human candidates, with GPT-4 Turbo not only outperforming the other models but also surpassing the average scores of graduate students at top business schools.
While AI's promise in education, assessment, and tutoring is clear, challenges remain.
- Score: 0.13654846342364302
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid evolution of artificial intelligence (AI), especially in the domain
of Large Language Models (LLMs) and generative AI, has opened new avenues for
application across various fields, yet its role in business education remains
underexplored. This study introduces the first benchmark to assess the
performance of seven major LLMs, OpenAI's models (GPT-3.5 Turbo, GPT-4, and
GPT-4 Turbo), Google's models (PaLM 2, Gemini 1.0 Pro), and Anthropic's models
(Claude 2 and Claude 2.1), on the GMAT, which is a key exam in the admission
process for graduate business programs. Our analysis shows that most LLMs
outperform human candidates, with GPT-4 Turbo not only outperforming the other
models but also surpassing the average scores of graduate students at top
business schools. Through a case study, this research examines GPT-4 Turbo's
ability to explain answers, evaluate responses, identify errors, tailor
instructions, and generate alternative scenarios. The latest LLM versions,
GPT-4 Turbo, Claude 2.1, and Gemini 1.0 Pro, show marked improvements in
reasoning tasks compared to their predecessors, underscoring their potential
for complex problem-solving. While AI's promise in education, assessment, and
tutoring is clear, challenges remain. Our study not only sheds light on LLMs'
academic potential but also emphasizes the need for careful development and
application of AI in education. As AI technology advances, it is imperative to
establish frameworks and protocols for AI interaction, verify the accuracy of
AI-generated content, ensure worldwide access for diverse learners, and create
an educational environment where AI supports human expertise. This research
sets the stage for further exploration into the responsible use of AI to enrich
educational experiences and improve exam preparation and assessment methods.
Related papers
- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [95.96983812740683]
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial Intelligence (AGI)
MLMs andWMs have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities.
In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI.
arXiv Detail & Related papers (2024-07-09T14:14:47Z) - Visions of a Discipline: Analyzing Introductory AI Courses on YouTube [11.209406323898019]
We analyze the 20 most watched introductory AI courses on YouTube.
Introductory AI courses do not meaningfully engage with ethical or societal challenges of AI.
We recommend that introductory AI courses should highlight ethical challenges of AI to present a more balanced perspective.
arXiv Detail & Related papers (2024-05-31T01:48:42Z) - Towards Integrating Emerging AI Applications in SE Education [4.956066467858058]
We present preliminary results of a systematic analysis of current trends in the area of AI.
We discuss a series of opportunities for AI applications and further research areas.
arXiv Detail & Related papers (2024-05-28T11:21:45Z) - AI-Tutoring in Software Engineering Education [0.7631288333466648]
We conducted an exploratory case study by integrating the GPT-3.5-Turbo model as an AI-Tutor within the APAS Artemis.
The findings highlight advantages, such as timely feedback and scalability.
However, challenges like generic responses and students' concerns about a learning progress inhibition when using the AI-Tutor were also evident.
arXiv Detail & Related papers (2024-04-03T08:15:08Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - Principle-Driven Self-Alignment of Language Models from Scratch with
Minimal Human Supervision [84.31474052176343]
Recent AI-assistant agents, such as ChatGPT, rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback to align the output with human intentions.
This dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision.
We propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision.
arXiv Detail & Related papers (2023-05-04T17:59:28Z) - Performance of ChatGPT on the US Fundamentals of Engineering Exam:
Comprehensive Assessment of Proficiency and Potential Implications for
Professional Environmental Engineering Practice [0.0]
This study investigates the feasibility and effectiveness of using ChatGPT, a GPT-4 based model, in achieving satisfactory performance on the Fundamentals of Engineering (FE) Environmental Exam.
The findings reflect remarkable improvements in mathematical capabilities across successive iterations of ChatGPT models, showcasing their potential in solving complex engineering problems.
arXiv Detail & Related papers (2023-04-20T16:54:34Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
GPT-5 All You Need? [112.12974778019304]
generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond.
In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks.
This work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc.
arXiv Detail & Related papers (2023-03-21T10:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.