Related papers: Game of Tones: Faculty detection of GPT-4 generated content in university assessments

Game of Tones: Faculty detection of GPT-4 generated content in university assessments

URL: http://arxiv.org/abs/2305.18081v1
Date: Mon, 29 May 2023 13:31:58 GMT
Title: Game of Tones: Faculty detection of GPT-4 generated content in university assessments
Authors: Mike Perkins (1), Jasper Roe (2), Darius Postma (1), James McGaughran (1), Don Hickerson (1) ((1) British University Vietnam, Vietnam, (2) James Cook University Singapore, Singapore)
Abstract summary: This study explores the robustness of university assessments against the use of Open AI's Gene-Trained Transformer. It evaluates the ability of academic staff to detect its use when supported by Artificial Intelligence (AI) detection tool.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study explores the robustness of university assessments against the use of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and evaluates the ability of academic staff to detect its use when supported by the Turnitin Artificial Intelligence (AI) detection tool. The research involved twenty-two GPT-4 generated submissions being created and included in the assessment process to be marked by fifteen different faculty members. The study reveals that although the detection tool identified 91% of the experimental submissions as containing some AI-generated content, the total detected content was only 54.8%. This suggests that the use of adversarial techniques regarding prompt engineering is an effective method in evading AI detection tools and highlights that improvements to AI detection software are needed. Using the Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions to the academic misconduct process, suggesting the need for increased awareness and training into these tools. Genuine submissions received a mean score of 54.4, whereas AI-generated content scored 52.3, indicating the comparable performance of GPT-4 in real-life situations. Recommendations include adjusting assessment strategies to make them more resistant to the use of AI tools, using AI-inclusive assessment where possible, and providing comprehensive training programs for faculty and students. This research contributes to understanding the relationship between AI-generated content and academic assessment, urging further investigation to preserve academic integrity.

Related papers

From G-Factor to A-Factor: Establishing a Psychometric Framework for AI Literacy [1.5031024722977635]
We establish AI literacy as a coherent, measurable construct with significant implications for education, workforce development, and social equity. Study 1 revealed a dominant latent factor - termed the "A-factor" - that accounts for 44.16% of variance across diverse AI interaction tasks. Study 2 refined the measurement tool by examining four key dimensions of AI literacy. Regression analyses identified several significant predictors of AI literacy, including cognitive abilities (IQ), educational background, prior AI experience, and training history.
arXiv Detail & Related papers (2025-03-16T14:51:48Z)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z)
Analyzing the Impact of AI Tools on Student Study Habits and Academic Performance [0.0]
The research focuses on how AI tools can support personalized learning, adaptive test adjustments, and provide real-time classroom analysis. Student feedback revealed strong support for these features, and the study found a significant reduction in study hours alongside an increase in GPA. Despite these benefits, challenges such as over-reliance on AI and difficulties in integrating AI with traditional teaching methods were also identified.
arXiv Detail & Related papers (2024-12-03T04:51:57Z)
Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z)
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning [78.42927884000673]
ExACT is an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms.
arXiv Detail & Related papers (2024-10-02T21:42:35Z)
Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [175.9723801486487]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions. GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions. Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z)
GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education [0.0]
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified. The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content. The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred.
arXiv Detail & Related papers (2024-03-28T04:57:13Z)
Can generative AI and ChatGPT outperform humans on cognitive-demanding problem-solving tasks in science? [1.1172147007388977]
This study compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items. Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments.
arXiv Detail & Related papers (2024-01-07T12:36:31Z)
Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies [28.98518677093905]
GPT-4V, OpenAI's latest large vision-language model, has piqued considerable interest for its potential in medical applications. Recent studies and internal reviews highlight its underperformance in specialized medical tasks. This paper explores the boundary of GPT-4V's capabilities in medicine, particularly in processing complex imaging data from endoscopies, CT scans, and MRIs etc.
arXiv Detail & Related papers (2023-12-07T15:05:59Z)
Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies [1.633179643849375]
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment. This work investigates the performance of ChatGPT by evaluating it across three courses.
arXiv Detail & Related papers (2023-11-27T20:10:13Z)
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks. We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds. Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
AI for CSI Feedback Enhancement in 5G-Advanced and 6G [51.276468472631976]
3rd Generation Partnership Project has started the study of Release 18 in 2021. This article provides a comprehensive overview of AI for CSI feedback enhancement in 5G-Advanced and 6G.
arXiv Detail & Related papers (2022-06-30T08:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.