Related papers: Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework

Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework

URL: http://arxiv.org/abs/2512.10758v1
Date: Thu, 11 Dec 2025 15:53:19 GMT
Title: Designing AI-Resilient Assessments Using Interconnected Problems: A Theoretically Grounded and Empirically Validated Framework
Authors: Kaihua Ding,
Abstract summary: The rapid adoption of generative AI has undermined traditional modular assessments in computing education.<n>This paper presents a theoretically grounded framework for designing AI-resilient assessments.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid adoption of generative AI has undermined traditional modular assessments in computing education, creating a disconnect between academic evaluation and industry practice. This paper presents a theoretically grounded framework for designing AI-resilient assessments, supported by formal analysis and multi-year empirical validation. We make three contributions. First, we establish two theoretical results: (1) assessments composed of interconnected problems, where outputs feed into subsequent stages, are more AI-resilient than modular assessments because current language models struggle with sustained multi-step reasoning and context; and (2) semi-structured problems with deterministic success criteria provide more reliable measures of student competency than fully open-ended projects, which allow AI systems to default to familiar solution patterns. These results challenge common policy and institutional guidance that promotes open-ended assessments as the primary safeguard for academic integrity. Second, we validate these results using data from four university data science courses (N = 138). While students achieve near-perfect scores on AI-assisted modular homework, performance drops by roughly 30 percentage points on proctored exams, indicating substantial AI score inflation. Interconnected projects remain strongly correlated with modular assessments, suggesting they measure the same underlying skills while resisting AI misuse. Proctored exams show weaker alignment, implying they may assess test-taking ability rather than intended learning outcomes. Third, we translate these findings into a practical assessment design framework. The proposed approach enables educators to create assessments that promote integrative thinking, reflect real-world AI-augmented workflows, and naturally resist trivial delegation to generative AI, thereby helping restore academic integrity.

Related papers

ChatGPT and Gemini participated in the Korean College Scholastic Ability Test -- Earth Science I [0.0]
This study utilizes the Earth Science I section of the 2025 Korean College Scholastic Ability Test (CSAT) to analyze the multimodal scientific reasoning capabilities and cognitive limitations of state-of-the-art Large Language Models (LLMs)<n> Quantitative results indicated that unstructured inputs led to significant performance degradation due to segmentation and Optical Character Recognition (OCR) failures.<n>By exploiting AI's weaknesses, educators can distinguish genuine student competency from AI-generated responses, thereby ensuring assessment fairness.
arXiv Detail & Related papers (2025-12-17T10:46:41Z)
Academics and Generative AI: Empirical and Epistemic Indicators of Policy-Practice Voids [0.0]
This study prototypes a ten-item, indirect-elicitation instrument embedded in a structured interpretive framework to surface voids between institutional rules and practitioner AI use.
arXiv Detail & Related papers (2025-11-04T06:24:47Z)
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning [57.868248683256574]
PRISM-Physics is a process-level evaluation framework and benchmark for complex physics reasoning problems.<n> Solutions are represented as directed acyclic graphs (DAGs) of formulas.<n>Results show that our evaluation framework is aligned with human experts' scoring.
arXiv Detail & Related papers (2025-10-03T17:09:03Z)
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark [71.3555284685426]
We introduce RealUnify, a benchmark designed to evaluate bidirectional capability synergy.<n>RealUnify comprises 1,000 meticulously human-annotated instances spanning 10 categories and 32 subtasks.<n>We find that current unified models still struggle to achieve effective synergy, indicating that architectural unification alone is insufficient.
arXiv Detail & Related papers (2025-09-29T15:07:28Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
AI-Educational Development Loop (AI-EDL): A Conceptual Framework to Bridge AI Capabilities with Classical Educational Theories [8.500617875591633]
This study introduces the AI-Educational Development Loop (AI-EDL), a theory-driven framework that integrates classical learning theories with human-in-the-loop artificial intelligence (AI)<n>The framework emphasizes transparency, self-regulated learning, and pedagogical oversight.
arXiv Detail & Related papers (2025-08-01T15:44:19Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Machine vs Machine: Using AI to Tackle Generative AI Threats in Assessment [0.0]
This paper presents a theoretical framework for addressing the challenges posed by generative artificial intelligence (AI) in higher education assessment.<n>Large language models like GPT-4, Claude, and Llama increasingly demonstrate the ability to produce sophisticated academic content.<n>Surveys indicate 74-92% of students experimenting with these tools for academic purposes.
arXiv Detail & Related papers (2025-05-31T22:29:43Z)
Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice [57.94036023167952]
We argue that the efforts aiming to study AI's ethical ramifications should be made in tandem with those evaluating its impacts on the environment.<n>We propose best practices to better integrate AI ethics and sustainability in AI research and practice.
arXiv Detail & Related papers (2025-04-01T13:53:11Z)
Beyond Detection: Designing AI-Resilient Assessments with Automated Feedback Tool to Foster Critical Thinking [0.0]
This research proposes a proactive, AI-resilient solution based on assessment design rather than detection.<n>It introduces a web-based Python tool that integrates Bloom's taxonomy with advanced natural language processing techniques.<n>It helps educators determine whether a task targets lower-order thinking such as recall and summarization or higher-order skills such as analysis, evaluation, and creation.
arXiv Detail & Related papers (2025-03-30T23:13:00Z)
Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms. Nearly 1,200 teams from across the world participated. We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z)
On the meaning of uncertainty for ethical AI: philosophy and practice [10.591284030838146]
We argue that this is a significant way to bring ethical considerations into mathematical reasoning. We demonstrate these ideas within the context of competing models used to advise the UK government on the spread of the Omicron variant of COVID-19 during December 2021.
arXiv Detail & Related papers (2023-09-11T15:13:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.