Assessing AI Detectors in Identifying AI-Generated Code: Implications
for Education
- URL: http://arxiv.org/abs/2401.03676v1
- Date: Mon, 8 Jan 2024 05:53:52 GMT
- Title: Assessing AI Detectors in Identifying AI-Generated Code: Implications
for Education
- Authors: Wei Hung Pan, Ming Jie Chok, Jonathan Leong Shan Wong, Yung Xin Shin,
Yeong Shian Poon, Zhou Yang, Chun Yong Chong, David Lo, Mei Kuan Lim
- Abstract summary: We present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors.
This is achieved by generating code in response to a given question using different variants.
Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code.
- Score: 8.592066814291819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Educators are increasingly concerned about the usage of Large Language Models
(LLMs) such as ChatGPT in programming education, particularly regarding the
potential exploitation of imperfections in Artificial Intelligence Generated
Content (AIGC) Detectors for academic misconduct. In this paper, we present an
empirical study where the LLM is examined for its attempts to bypass detection
by AIGC Detectors. This is achieved by generating code in response to a given
question using different variants. We collected a dataset comprising 5,069
samples, with each sample consisting of a textual description of a coding
problem and its corresponding human-written Python solution codes. These
samples were obtained from various sources, including 80 from Quescol, 3,264
from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of
code problem variant prompts, which were used to instruct ChatGPT to generate
the outputs. Subsequently, we assessed the performance of five AIGC detectors.
Our results demonstrate that existing AIGC Detectors perform poorly in
distinguishing between human-written code and AI-generated code.
Related papers
- An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We? [8.0988059417354]
We propose a range of approaches to improve the performance of AI-generated code detection.
Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55.
arXiv Detail & Related papers (2024-11-06T22:48:18Z) - Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants.
Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - DeVAIC: A Tool for Security Assessment of AI-generated Code [5.383910843560784]
DeVAIC (Detection of Vulnerabilities in AI-generated Code) is a tool to evaluate the security of AI-generated Python code.
arXiv Detail & Related papers (2024-04-11T08:27:23Z) - Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A
case study on CodeChef problems [0.13124513975412253]
We use code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code.
Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4.
Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
arXiv Detail & Related papers (2024-03-06T19:51:26Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Paraphrasing evades detectors of AI-generated text, but retrieval is an
effective defense [56.077252790310176]
We present a paraphrase generation model (DIPPER) that can paraphrase paragraphs, condition on surrounding context, and control lexical diversity and content reordering.
Using DIPPER to paraphrase text generated by three large language models (including GPT3.5-davinci-003) successfully evades several detectors, including watermarking.
We introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.
arXiv Detail & Related papers (2023-03-23T16:29:27Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.