Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
- URL: http://arxiv.org/abs/2509.13359v3
- Date: Mon, 29 Sep 2025 12:08:08 GMT
- Title: Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
- Authors: Benjamin J. Walker, Nikoleta Kalaydzhieva, Beatriz Navarro Lameda, Ruth A. Reynolds,
- Abstract summary: We generate, transcribe, and blind-mark GenAI submissions to eight undergraduate mathematics examinations at a Russell Group university.<n>We find that GenAI attainment is at the level of a first-class degree, though current performance can vary between modules.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative artificial intelligence (GenAI) tools such as OpenAI's ChatGPT are transforming the educational landscape, prompting reconsideration of traditional assessment practices. In parallel, universities are exploring alternatives to in-person, closed-book examinations, raising concerns about academic integrity and pedagogical alignment in uninvigilated settings. This study investigates whether traditional closed-book mathematics examinations retain their pedagogical relevance when hypothetically administered in uninvigilated, open-book settings with GenAI access. Adopting an empirical approach, we generate, transcribe, and blind-mark GenAI submissions to eight undergraduate mathematics examinations at a Russell Group university, spanning the entirety of the first-year curriculum. By combining independent GenAI responses to individual questions, we enable a meaningful evaluation of GenAI performance, both at the level of modules and across the first-year curriculum. We find that GenAI attainment is at the level of a first-class degree, though current performance can vary between modules. Further, we find that GenAI performance is remarkably consistent when viewed across the entire curriculum, significantly more so than that of students in invigilated examinations. Our findings evidence the need for redesigning assessments in mathematics for unsupervised settings, and highlight the potential reduction in pedagogical value of current standards in the era of generative artificial intelligence.
Related papers
- GenExam: A Multidisciplinary Text-to-Image Exam [91.06661449186537]
GenExam is the first benchmark for multidisciplinary text-to-image exams.<n>It features 1,000 samples across 10 subjects with exam-style prompts organized under a four-level taxonomy.<n>Experiments show that even state-of-the-art models such as GPT-Image-1 and Gemini-2.5-Flash-Image achieve less than 15% strict scores.
arXiv Detail & Related papers (2025-09-17T17:59:14Z) - The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z) - Integrating Universal Generative AI Platforms in Educational Labs to Foster Critical Thinking and Digital Literacy [0.3749861135832073]
This paper presents a new educational framework for integrating generative artificial intelligence (GenAI) platforms into laboratory activities.<n> Recognizing the limitations and risks of uncritical reliance on large language models (LLMs), the proposed pedagogical model reframes GenAI as a research subject and cognitive tool.
arXiv Detail & Related papers (2025-06-11T17:45:51Z) - Encouraging Students' Responsible Use of GenAI in Software Engineering Education: A Causal Model and Two Institutional Applications [1.1511012020557325]
generative AI (GenAI) tools such as ChatGPT and GitHub Copilot become pervasive in education.<n>Concerns are rising about students using them to complete rather than learn from coursework.<n>This paper proposes and empirically applies a causal model to help educators scaffold responsible GenAI use in Software Engineering education.
arXiv Detail & Related papers (2025-05-31T19:27:40Z) - From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models [44.99833362998488]
We investigated the first steps for optimizing content creation for advanced math.<n>We looked at the ability of GenAI to produce high-quality practice problems that are relevant to the course content.
arXiv Detail & Related papers (2025-05-17T08:30:10Z) - Evaluating the AI-Lab Intervention: Impact on Student Perception and Use of Generative AI in Early Undergraduate Computer Science Courses [0.0]
Generative AI (GenAI) is rapidly entering computer science education.<n>Concerns about overreliance coexist with a gap in research on structured scaffolding to guide tool use in formal courses.<n>This study examines the impact of a dedicated "AI-Lab" intervention on undergraduate students.
arXiv Detail & Related papers (2025-04-30T18:12:42Z) - Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge [78.35388859345056]
We argue that the ML community would benefit from learning from and drawing on the social sciences when developing measurement instruments for evaluating GenAI systems.<n>We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI systems.
arXiv Detail & Related papers (2025-02-01T21:09:51Z) - From Automation to Cognition: Redefining the Roles of Educators and Generative AI in Computing Education [2.0628700367476203]
Generative Artificial Intelligence (GenAI) offers opportunities to revolutionise teaching and learning in Computing Education (CE)<n>However, educators have expressed concerns that students may over-rely on GenAI and use these tools to generate solutions without engaging in the learning process.<n>This paper describes our experiences with using GenAI in CS-focused educational settings and the changes we have implemented accordingly in our teaching.
arXiv Detail & Related papers (2024-12-16T03:36:25Z) - Dimensions of Generative AI Evaluation Design [51.541816010127256]
We propose a set of general dimensions that capture critical choices involved in GenAI evaluation design.
These dimensions include the evaluation setting, the task type, the input source, the interaction style, the duration, the metric type, and the scoring method.
arXiv Detail & Related papers (2024-11-19T18:25:30Z) - Early Adoption of Generative Artificial Intelligence in Computing Education: Emergent Student Use Cases and Perspectives in 2023 [38.83649319653387]
There is limited prior research on computing students' use and perceptions of GenAI.
We surveyed all computer science majors in a small engineering-focused R1 university.
We discuss the impact of our findings on the emerging conversation around GenAI and education.
arXiv Detail & Related papers (2024-11-17T20:17:47Z) - Model-based Maintenance and Evolution with GenAI: A Look into the Future [47.93555901495955]
We argue that Generative Artificial Intelligence (GenAI) can be used as a means to address the limitations of Model-Based Engineering (MBM&E)
We propose that GenAI can be used in MBM&E for: reducing engineers' learning curve, maximizing efficiency with recommendations, or serving as a reasoning tool to understand domain problems.
arXiv Detail & Related papers (2024-07-09T23:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.