Machine vs Machine: Using AI to Tackle Generative AI Threats in Assessment
- URL: http://arxiv.org/abs/2506.02046v1
- Date: Sat, 31 May 2025 22:29:43 GMT
- Title: Machine vs Machine: Using AI to Tackle Generative AI Threats in Assessment
- Authors: Mohammad Saleh Torkestani, Taha Mansouri,
- Abstract summary: This paper presents a theoretical framework for addressing the challenges posed by generative artificial intelligence (AI) in higher education assessment.<n>Large language models like GPT-4, Claude, and Llama increasingly demonstrate the ability to produce sophisticated academic content.<n>Surveys indicate 74-92% of students experimenting with these tools for academic purposes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a theoretical framework for addressing the challenges posed by generative artificial intelligence (AI) in higher education assessment through a machine-versus-machine approach. Large language models like GPT-4, Claude, and Llama increasingly demonstrate the ability to produce sophisticated academic content, traditional assessment methods face an existential threat, with surveys indicating 74-92% of students experimenting with these tools for academic purposes. Current responses, ranging from detection software to manual assessment redesign, show significant limitations: detection tools demonstrate bias against non-native English writers and can be easily circumvented, while manual frameworks rely heavily on subjective judgment and assume static AI capabilities. This paper introduces a dual strategy paradigm combining static analysis and dynamic testing to create a comprehensive theoretical framework for assessment vulnerability evaluation. The static analysis component comprises eight theoretically justified elements: specificity and contextualization, temporal relevance, process visibility requirements, personalization elements, resource accessibility, multimodal integration, ethical reasoning requirements, and collaborative elements. Each element addresses specific limitations in generative AI capabilities, creating barriers that distinguish authentic human learning from AI-generated simulation. The dynamic testing component provides a complementary approach through simulation-based vulnerability assessment, addressing limitations in pattern-based analysis. The paper presents a theoretical framework for vulnerability scoring, including the conceptual basis for quantitative assessment, weighting frameworks, and threshold determination theory.
Related papers
- Beyond classical and contemporary models: a transformative AI framework for student dropout prediction in distance learning using RAG, Prompt engineering, and Cross-modal fusion [0.4369550829556578]
This paper introduces a transformative AI framework that redefines dropout prediction.<n>The framework achieves 89% accuracy and an F1-score of 0.88, outperforming conventional models by 7% and reducing false negatives by 21%.
arXiv Detail & Related papers (2025-07-04T21:41:43Z) - A Human-Centric Approach to Explainable AI for Personalized Education [1.0878040851638]
This thesis aims to bring human needs to the forefront of eXplainable AI (XAI) research.<n>We propose four novel technical contributions in interpretability with a multimodal modular architecture.<n>Our work lays a foundation for human-centric AI systems that balance state-of-the-art performance with built-in transparency and trust.
arXiv Detail & Related papers (2025-05-28T16:23:48Z) - Beyond Detection: Designing AI-Resilient Assessments with Automated Feedback Tool to Foster Critical Thinking [0.0]
This research proposes a proactive, AI-resilient solution based on assessment design rather than detection.<n>It introduces a web-based Python tool that integrates Bloom's taxonomy with advanced natural language processing techniques.<n>It helps educators determine whether a task targets lower-order thinking such as recall and summarization or higher-order skills such as analysis, evaluation, and creation.
arXiv Detail & Related papers (2025-03-30T23:13:00Z) - Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework [61.38174427966444]
Large Language Models (LLMs) are being used more and more extensively for automated evaluation in various scenarios.<n>Previous studies have attempted to fine-tune open-source LLMs to replicate the evaluation explanations and judgments of powerful proprietary models.<n>We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses.
arXiv Detail & Related papers (2025-02-26T06:31:45Z) - Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence [0.9671462473115854]
Generative AI such as those with large language models have created opportunities for innovative assessment design practices.
This paper presents a framework that explores the capabilities of the LLM ChatGPT4 application, which is the current industry benchmark.
This critique will provide specific and targeted indications of their questions vulnerabilities in terms of the critical thinking skills.
arXiv Detail & Related papers (2024-06-20T22:46:56Z) - Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z) - From Adversarial Arms Race to Model-centric Evaluation: Motivating a
Unified Automatic Robustness Evaluation Framework [91.94389491920309]
Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs.
The existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples.
We set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to exploit the advantages of adversarial attacks.
arXiv Detail & Related papers (2023-05-29T14:55:20Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - The Meta-Evaluation Problem in Explainable AI: Identifying Reliable
Estimators with MetaQuantus [10.135749005469686]
One of the unsolved challenges in the field of Explainable AI (XAI) is determining how to most reliably estimate the quality of an explanation method.
We address this issue through a meta-evaluation of different quality estimators in XAI.
Our novel framework, MetaQuantus, analyses two complementary performance characteristics of a quality estimator.
arXiv Detail & Related papers (2023-02-14T18:59:02Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - An interdisciplinary conceptual study of Artificial Intelligence (AI)
for helping benefit-risk assessment practices: Towards a comprehensive
qualification matrix of AI programs and devices (pre-print 2020) [55.41644538483948]
This paper proposes a comprehensive analysis of existing concepts coming from different disciplines tackling the notion of intelligence.
The aim is to identify shared notions or discrepancies to consider for qualifying AI systems.
arXiv Detail & Related papers (2021-05-07T12:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.