Game of Tones: Faculty detection of GPT-4 generated content in
university assessments
- URL: http://arxiv.org/abs/2305.18081v1
- Date: Mon, 29 May 2023 13:31:58 GMT
- Title: Game of Tones: Faculty detection of GPT-4 generated content in
university assessments
- Authors: Mike Perkins (1), Jasper Roe (2), Darius Postma (1), James McGaughran
(1), Don Hickerson (1) ((1) British University Vietnam, Vietnam, (2) James
Cook University Singapore, Singapore)
- Abstract summary: This study explores the robustness of university assessments against the use of Open AI's Gene-Trained Transformer.
It evaluates the ability of academic staff to detect its use when supported by Artificial Intelligence (AI) detection tool.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This study explores the robustness of university assessments against the use
of Open AI's Generative Pre-Trained Transformer 4 (GPT-4) generated content and
evaluates the ability of academic staff to detect its use when supported by the
Turnitin Artificial Intelligence (AI) detection tool. The research involved
twenty-two GPT-4 generated submissions being created and included in the
assessment process to be marked by fifteen different faculty members. The study
reveals that although the detection tool identified 91% of the experimental
submissions as containing some AI-generated content, the total detected content
was only 54.8%. This suggests that the use of adversarial techniques regarding
prompt engineering is an effective method in evading AI detection tools and
highlights that improvements to AI detection software are needed. Using the
Turnitin AI detect tool, faculty reported 54.5% of the experimental submissions
to the academic misconduct process, suggesting the need for increased awareness
and training into these tools. Genuine submissions received a mean score of
54.4, whereas AI-generated content scored 52.3, indicating the comparable
performance of GPT-4 in real-life situations. Recommendations include adjusting
assessment strategies to make them more resistant to the use of AI tools, using
AI-inclusive assessment where possible, and providing comprehensive training
programs for faculty and students. This research contributes to understanding
the relationship between AI-generated content and academic assessment, urging
further investigation to preserve academic integrity.
Related papers
- AI-Tutoring in Software Engineering Education [0.7631288333466648]
We conducted an exploratory case study by integrating the GPT-3.5-Turbo model as an AI-Tutor within the APAS Artemis.
The findings highlight advantages, such as timely feedback and scalability.
However, challenges like generic responses and students' concerns about a learning progress inhibition when using the AI-Tutor were also evident.
arXiv Detail & Related papers (2024-04-03T08:15:08Z) - GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education [0.0]
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified.
The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content.
The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred.
arXiv Detail & Related papers (2024-03-28T04:57:13Z) - Can generative AI and ChatGPT outperform humans on cognitive-demanding
problem-solving tasks in science? [1.1172147007388977]
This study compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items.
Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments.
arXiv Detail & Related papers (2024-01-07T12:36:31Z) - Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on
Prompt Engineering Strategies [28.98518677093905]
GPT-4V, OpenAI's latest large vision-language model, has piqued considerable interest for its potential in medical applications.
Recent studies and internal reviews highlight its underperformance in specialized medical tasks.
This paper explores the boundary of GPT-4V's capabilities in medicine, particularly in processing complex imaging data from endoscopies, CT scans, and MRIs etc.
arXiv Detail & Related papers (2023-12-07T15:05:59Z) - Student Mastery or AI Deception? Analyzing ChatGPT's Assessment
Proficiency and Evaluating Detection Strategies [1.633179643849375]
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment.
This work investigates the performance of ChatGPT by evaluating it across three courses.
arXiv Detail & Related papers (2023-11-27T20:10:13Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Holistic Evaluation of GPT-4V for Biomedical Imaging [113.46226609088194]
GPT-4V represents a breakthrough in artificial general intelligence for computer vision.
We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and more.
Results show GPT-4V's proficiency in modality and anatomy recognition but difficulty with disease diagnosis and localization.
arXiv Detail & Related papers (2023-11-10T18:40:44Z) - ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [170.7899683843177]
ToRA is a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems.
ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales.
ToRA-Code-34B is the first open-source model that achieves an accuracy exceeding 50% on MATH.
arXiv Detail & Related papers (2023-09-29T17:59:38Z) - How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language
Understanding Tasks [65.7949334650854]
GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks.
However, their robustness and abilities to handle various complexities of the open world have yet to be explored.
We show that GPT-3.5 faces some specific robustness challenges, including instability, prompt sensitivity, and number sensitivity.
arXiv Detail & Related papers (2023-03-01T07:39:01Z) - The Role of AI in Drug Discovery: Challenges, Opportunities, and
Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed.
The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z) - AI for CSI Feedback Enhancement in 5G-Advanced and 6G [51.276468472631976]
3rd Generation Partnership Project has started the study of Release 18 in 2021.
This article provides a comprehensive overview of AI for CSI feedback enhancement in 5G-Advanced and 6G.
arXiv Detail & Related papers (2022-06-30T08:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.