Game of Tones: Faculty detection of GPT-4 generated content in
university assessments
- URL: http://arxiv.org/abs/2305.18081v1
- Date: Mon, 29 May 2023 13:31:58 GMT
- Title: Game of Tones: Faculty detection of GPT-4 generated content in
university assessments
- Authors: Mike Perkins (1), Jasper Roe (2), Darius Postma (1), James McGaughran
(1), Don Hickerson (1) ((1) British University Vietnam, Vietnam, (2) James
Cook University Singapore, Singapore)
- Abstract summary: This study explores the robustness of university assessments against the use of Open AI's Gene-Trained Transformer.
It evaluates the ability of academic staff to detect its use when supported by Artificial Intelligence (AI) detection tool.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study explores the robustness of university assessments against the use
of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and
evaluates the ability of academic staff to detect its use when supported by the
Turnitin Artificial Intelligence (AI) detection tool. The research involved
twenty-two GPT-4 generated submissions being created and included in the
assessment process to be marked by fifteen different faculty members. The study
reveals that although the detection tool identified 91% of the experimental
submissions as containing some AI-generated content, the total detected content
was only 54.8%. This suggests that the use of adversarial techniques regarding
prompt engineering is an effective method in evading AI detection tools and
highlights that improvements to AI detection software are needed. Using the
Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions
to the academic misconduct process, suggesting the need for increased awareness
and training into these tools. Genuine submissions received a mean score of
54.4, whereas AI-generated content scored 52.3, indicating the comparable
performance of GPT-4 in real-life situations. Recommendations include adjusting
assessment strategies to make them more resistant to the use of AI tools, using
AI-inclusive assessment where possible, and providing comprehensive training
programs for faculty and students. This research contributes to understanding
the relationship between AI-generated content and academic assessment, urging
further investigation to preserve academic integrity.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning [78.42927884000673]
ExACT is an approach to combine test-time search and self-learning to build o1-like models for agentic applications.
We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly.
Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms.
arXiv Detail & Related papers (2024-10-02T21:42:35Z) - Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants [175.9723801486487]
We evaluate whether two AI assistants, GPT-3.5 and GPT-4, can adequately answer assessment questions.
GPT-4 answers an average of 65.8% of questions correctly, and can even produce the correct answer across at least one prompting strategy for 85.1% of questions.
Our results call for revising program-level assessment design in higher education in light of advances in generative AI.
arXiv Detail & Related papers (2024-08-07T12:11:49Z) - GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education [0.0]
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified.
The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content.
The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred.
arXiv Detail & Related papers (2024-03-28T04:57:13Z) - Can generative AI and ChatGPT outperform humans on cognitive-demanding
problem-solving tasks in science? [1.1172147007388977]
This study compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items.
Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments.
arXiv Detail & Related papers (2024-01-07T12:36:31Z) - Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on
Prompt Engineering Strategies [28.98518677093905]
GPT-4V, OpenAI's latest large vision-language model, has piqued considerable interest for its potential in medical applications.
Recent studies and internal reviews highlight its underperformance in specialized medical tasks.
This paper explores the boundary of GPT-4V's capabilities in medicine, particularly in processing complex imaging data from endoscopies, CT scans, and MRIs etc.
arXiv Detail & Related papers (2023-12-07T15:05:59Z) - Student Mastery or AI Deception? Analyzing ChatGPT's Assessment
Proficiency and Evaluating Detection Strategies [1.633179643849375]
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment.
This work investigates the performance of ChatGPT by evaluating it across three courses.
arXiv Detail & Related papers (2023-11-27T20:10:13Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - AI for CSI Feedback Enhancement in 5G-Advanced and 6G [51.276468472631976]
3rd Generation Partnership Project has started the study of Release 18 in 2021.
This article provides a comprehensive overview of AI for CSI feedback enhancement in 5G-Advanced and 6G.
arXiv Detail & Related papers (2022-06-30T08:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.