An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?
- URL: http://arxiv.org/abs/2411.04299v1
- Date: Wed, 06 Nov 2024 22:48:18 GMT
- Title: An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?
- Authors: Hyunjae Suh, Mahan Tafreshipour, Jiawei Li, Adithya Bhattiprolu, Iftekhar Ahmed,
- Abstract summary: We propose a range of approaches to improve the performance of AI-generated code detection.
Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55.
- Score: 8.0988059417354
- License:
- Abstract: Artificial Intelligence (AI) techniques, especially Large Language Models (LLMs), have started gaining popularity among researchers and software developers for generating source code. However, LLMs have been shown to generate code with quality issues and also incurred copyright/licensing infringements. Therefore, detecting whether a piece of source code is written by humans or AI has become necessary. This study first presents an empirical analysis to investigate the effectiveness of the existing AI detection tools in detecting AI-generated code. The results show that they all perform poorly and lack sufficient generalizability to be practically deployed. Then, to improve the performance of AI-generated code detection, we propose a range of approaches, including fine-tuning the LLMs and machine learning-based classification with static code metrics or code embedding generated from Abstract Syntax Tree (AST). Our best model outperforms state-of-the-art AI-generated code detector (GPTSniffer) and achieves an F1 score of 82.55. We also conduct an ablation study on our best-performing model to investigate the impact of different source code features on its performance.
Related papers
- Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes [0.0]
We present results from an empirical exploration, where a human expert-generated SE artifact was taken as a benchmark.
We then adopted a two-fold mixed-methods approach to compare AI generated artifacts against the benchmark.
We find that while the two-material appear very similar, AI generated artifacts exhibit serious failure modes that could be difficult to detect.
arXiv Detail & Related papers (2025-02-13T17:05:18Z) - Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score.
It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models.
Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z) - Intelligent Green Efficiency for Intrusion Detection [0.0]
This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve performance of AI.
Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory.
Results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy.
arXiv Detail & Related papers (2024-11-11T15:01:55Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - MOSAIC: Multiple Observers Spotting AI Content, a Robust Approach to Machine-Generated Text Detection [35.67613230687864]
Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities.
Various proposals have been made to automatically discriminate artificially generated from human-written texts.
We derive a new, theoretically grounded approach to combine their respective strengths.
Our experiments, using a variety of generator LLMs, suggest that our method effectively leads to robust detection performances.
arXiv Detail & Related papers (2024-09-11T20:55:12Z) - Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the original code and its LLM-rewritten variants.
Our results demonstrate a significant improvement over existing SOTA synthetic content detectors.
arXiv Detail & Related papers (2024-05-25T08:57:28Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.
The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.
We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - An Initial Look at Self-Reprogramming Artificial Intelligence [0.0]
We develop and experimentally validate the first fully self-reprogramming AI system.
Applying AI-based computer code generation to AI itself, we implement an algorithm with the ability to continuously modify and rewrite its own neural network source code.
arXiv Detail & Related papers (2022-04-30T05:44:34Z) - Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering.
In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.