Related papers: Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants

Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants

URL: http://arxiv.org/abs/2512.04107v1
Date: Fri, 28 Nov 2025 17:42:36 GMT
Title: Rethinking AI Evaluation in Education: The TEACH-AI Framework and Benchmark for Generative AI Assistants
Authors: Shi Ding, Brian Magerko,
Abstract summary: TEACH-AI is a domain-independent, pedagogically grounded, and stakeholder-aligned framework for guiding the design, development, and evaluation of generative AI systems in education.<n>Our work invites the community to reconsider what constructs "effective" AI in education and to design model evaluation approaches that promote co-creation, inclusivity, and long-term human, social, and educational impact.
Score: 8.591535882390918
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As generative artificial intelligence (AI) continues to transform education, most existing AI evaluations rely primarily on technical performance metrics such as accuracy or task efficiency while overlooking human identity, learner agency, contextual learning processes, and ethical considerations. In this paper, we present TEACH-AI (Trustworthy and Effective AI Classroom Heuristics), a domain-independent, pedagogically grounded, and stakeholder-aligned framework with measurable indicators and a practical toolkit for guiding the design, development, and evaluation of generative AI systems in educational contexts. Built on an extensive literature review and synthesis, the ten-component assessment framework and toolkit checklist provide a foundation for scalable, value-aligned AI evaluation in education. TEACH-AI rethinks "evaluation" through sociotechnical, educational, theoretical, and applied lenses, engaging designers, developers, researchers, and policymakers across AI and education. Our work invites the community to reconsider what constructs "effective" AI in education and to design model evaluation approaches that promote co-creation, inclusivity, and long-term human, social, and educational impact.

Related papers

Pedagogy-driven Evaluation of Generative AI-powered Intelligent Tutoring Systems [15.954407353419258]
generative AI (GenAI) models have accelerated the development of large language model (LLM)-powered Intelligent Tutoring Systems (ITSs)<n>However, the progress and impact of these systems remain largely untraceable due to the absence of reliable, universally accepted, and pedagogy-driven evaluation frameworks and benchmarks.<n>Most existing educational dialogue-based ITS evaluations rely on subjective protocols and non-standardized benchmarks, leading to inconsistencies and limited generalizability.<n>This work provides comprehensive state-of-the-art evaluation practices, highlighting associated challenges through real-world case studies from careful and caring AIED research.
arXiv Detail & Related papers (2025-10-26T08:44:21Z)
AI-Educational Development Loop (AI-EDL): A Conceptual Framework to Bridge AI Capabilities with Classical Educational Theories [8.500617875591633]
This study introduces the AI-Educational Development Loop (AI-EDL), a theory-driven framework that integrates classical learning theories with human-in-the-loop artificial intelligence (AI)<n>The framework emphasizes transparency, self-regulated learning, and pedagogical oversight.
arXiv Detail & Related papers (2025-08-01T15:44:19Z)
A Review of Generative AI in Computer Science Education: Challenges and Opportunities in Accuracy, Authenticity, and Assessment [2.1891582280781634]
This paper surveys the use of Generative AI tools, such as ChatGPT and Claude, in computer science education.<n>Generative AI raises concerns such as AI hallucinations, error propagation, bias, and blurred lines between AI-assisted and student-authored content.
arXiv Detail & Related papers (2025-06-17T19:20:58Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Methodological Foundations for AI-Driven Survey Question Generation [41.94295877935867]
This paper presents a methodological framework for using generative AI in educational survey research.<n>We explore how Large Language Models can generate adaptive, context-aware survey questions.<n>We examine ethical issues such as bias, privacy, and transparency.
arXiv Detail & Related papers (2025-05-02T09:50:34Z)
Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation [0.0]
This paper synthesizes insights from four related studies to propose a comprehensive framework for enhancing AI-driven educational tools.<n>We integrate cognitive assessment frameworks, linguistic analysis of AI-generated feedback, and ethical design principles to guide the development of effective and responsible AI tools.
arXiv Detail & Related papers (2025-05-01T06:36:21Z)
Evaluation Framework for AI Systems in "the Wild" [37.48117853114386]
Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use.<n>Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect real-world performance.<n>This white paper proposes a comprehensive framework for how we should evaluate real-world GenAI systems.
arXiv Detail & Related papers (2025-04-23T14:52:39Z)
Generative AI Literacy: Twelve Defining Competencies [48.90506360377104]
This paper introduces a competency-based model for generative artificial intelligence (AI) literacy covering essential skills and knowledge areas necessary to interact with generative AI.<n>The competencies range from foundational AI literacy to prompt engineering and programming skills, including ethical and legal considerations.<n>These twelve competencies offer a framework for individuals, policymakers, government officials, and educators looking to navigate and take advantage of the potential of generative AI responsibly.
arXiv Detail & Related papers (2024-11-29T14:55:15Z)
Comprehensive AI Assessment Framework: Enhancing Educational Evaluation with Ethical AI Integration [0.0]
This paper presents the Comprehensive AI Assessment Framework (CAIAF), an evolved version of the AI Assessment Scale (AIAS) by Perkins, Furze, Roe, and MacVaugh. The CAIAF incorporates stringent ethical guidelines, with clear distinctions based on educational levels, and advanced AI capabilities. The framework will ensure better learning outcomes, uphold academic integrity, and promote responsible use of AI.
arXiv Detail & Related papers (2024-06-07T07:18:42Z)
AGI: Artificial General Intelligence for Education [41.45039606933712]
This position paper reviews artificial general intelligence (AGI)'s key concepts, capabilities, scope, and potential within future education. It highlights that AGI can significantly improve intelligent tutoring systems, educational assessment, and evaluation procedures. The paper emphasizes that AGI's capabilities extend to understanding human emotions and social interactions.
arXiv Detail & Related papers (2023-04-24T22:31:59Z)
An interdisciplinary conceptual study of Artificial Intelligence (AI) for helping benefit-risk assessment practices: Towards a comprehensive qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence. The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.