Related papers: Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering

Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering

URL: http://arxiv.org/abs/2601.15626v1
Date: Thu, 22 Jan 2026 03:57:47 GMT
Title: Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering
Authors: Lili Chen, Winn Wing-Yiu Chow, Stella Peng, Bencheng Fan, Sachitha Bandara,
Abstract summary: This study investigates how GenAI can be integrated with a criterion-referenced grading framework to improve the efficiency and quality of grading for mathematical assessments in engineering.
Score: 5.378302855328016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: PURPOSE OR GOAL: This study investigates how GenAI can be integrated with a criterion-referenced grading framework to improve the efficiency and quality of grading for mathematical assessments in engineering. It specifically explores the challenges demonstrators face with manual, model solution-based grading and how a GenAI-supported system can be designed to reliably identify student errors, provide high-quality feedback, and support human graders. The research also examines human graders' perceptions of the effectiveness of this GenAI-assisted approach. ACTUAL OR ANTICIPATED OUTCOMES: The study found that GenAI achieved an overall grading accuracy of 92.5%, comparable to two experienced human graders. The two researchers, who also served as subject demonstrators, perceived the GenAI as a helpful second reviewer that improved accuracy by catching small errors and provided more complete feedback than they could manually. A central outcome was the significant enhancement of formative feedback. However, they noted the GenAI tool is not yet reliable enough for autonomous use, especially with unconventional solutions. CONCLUSIONS/RECOMMENDATIONS/SUMMARY: This study demonstrates that GenAI, when paired with a structured, criterion-referenced framework using binary questions, can grade engineering mathematical assessments with an accuracy comparable to human experts. Its primary contribution is a novel methodological approach that embeds the generation of high-quality, scalable formative feedback directly into the assessment workflow. Future work should investigate student perceptions of GenAI grading and feedback.

Related papers

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study [0.0]
We generate, transcribe, and blind-mark GenAI submissions to eight undergraduate mathematics examinations at a Russell Group university.<n>We find that GenAI attainment is at the level of a first-class degree, though current performance can vary between modules.
arXiv Detail & Related papers (2025-09-15T10:34:31Z)
The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z)
Encouraging Students' Responsible Use of GenAI in Software Engineering Education: A Causal Model and Two Institutional Applications [0.4306143768014156]
As generative AI (GenAI) tools become pervasive in education, concerns are rising about students using them to complete rather than learn.<n>This paper proposes and empirically applies a causal model to help educators scaffold responsible GenAI use in Software Engineering (SE) education.
arXiv Detail & Related papers (2025-05-31T19:27:40Z)
From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models [44.99833362998488]
We investigated the first steps for optimizing content creation for advanced math.<n>We looked at the ability of GenAI to produce high-quality practice problems that are relevant to the course content.
arXiv Detail & Related papers (2025-05-17T08:30:10Z)
Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z)
Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge [78.35388859345056]
We argue that the ML community would benefit from learning from and drawing on the social sciences when developing measurement instruments for evaluating GenAI systems.<n>We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI systems.
arXiv Detail & Related papers (2025-02-01T21:09:51Z)
Evaluating Generative AI-Enhanced Content: A Conceptual Framework Using Qualitative, Quantitative, and Mixed-Methods Approaches [0.0]
Generative AI (GenAI) has revolutionized content generation, offering transformative capabilities for improving language coherence, readability, and overall quality.<n>This manuscript explores the application of qualitative, quantitative, and mixed-methods research approaches to evaluate the performance of GenAI models in enhancing scientific writing.
arXiv Detail & Related papers (2024-11-26T23:34:07Z)
Dimensions of Generative AI Evaluation Design [51.541816010127256]
We propose a set of general dimensions that capture critical choices involved in GenAI evaluation design. These dimensions include the evaluation setting, the task type, the input source, the interaction style, the duration, the metric type, and the scoring method.
arXiv Detail & Related papers (2024-11-19T18:25:30Z)
Model-based Maintenance and Evolution with GenAI: A Look into the Future [47.93555901495955]
We argue that Generative Artificial Intelligence (GenAI) can be used as a means to address the limitations of Model-Based Engineering (MBM&E) We propose that GenAI can be used in MBM&E for: reducing engineers' learning curve, maximizing efficiency with recommendations, or serving as a reasoning tool to understand domain problems.
arXiv Detail & Related papers (2024-07-09T23:13:26Z)
Understanding Student and Academic Staff Perceptions of AI Use in Assessment and Feedback [0.0]
The rise of Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) in higher education necessitates assessment reform. This study addresses a critical gap by exploring student and academic staff experiences with AI and GenAI tools. An online survey collected data from 35 academic staff and 282 students across two universities in Vietnam and one in Singapore.
arXiv Detail & Related papers (2024-06-22T10:25:01Z)
GAIA: Rethinking Action Quality Assessment for AI-Generated Videos [56.047773400426486]
Action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features. We construct GAIA, a Generic AI-generated Action dataset, by conducting a large-scale subjective evaluation from a novel causal reasoning-based perspective. Results show that traditional AQA methods, action-related metrics in recent T2V benchmarks, and mainstream video quality methods perform poorly with an average SRCC of 0.454, 0.191, and 0.519, respectively.
arXiv Detail & Related papers (2024-06-10T08:18:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.