Related papers: An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?

Related papers

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z)
CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs [15.25980318643715]
Large language models (LLMs) have become integral to modern software development, producing vast amounts of AI-generated source code.<n>Existing benchmarks fall short -- most cover only a limited set of programming languages and rely on less capable generative models.<n>We present CodeMirage, a benchmark that spans ten widely used programming languages and includes both original and paraphrased code samples.
arXiv Detail & Related papers (2025-05-27T03:25:12Z)
Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world. adversaries might exploit large multimodal models to create images that violate ethical or legal standards. Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z)
Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
Trust at Your Own Peril: A Mixed Methods Exploration of the Ability of Large Language Models to Generate Expert-Like Systems Engineering Artifacts and a Characterization of Failure Modes [0.0]
We present results from an empirical exploration, where a human expert-generated SE artifact was taken as a benchmark. We then adopted a two-fold mixed-methods approach to compare AI generated artifacts against the benchmark. We find that while the two-material appear very similar, AI generated artifacts exhibit serious failure modes that could be difficult to detect.
arXiv Detail & Related papers (2025-02-13T17:05:18Z)
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score. It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models. Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z)
Intelligent Green Efficiency for Intrusion Detection [0.0]
This paper presents an assessment of different programming languages and Feature Selection (FS) methods to improve performance of AI. Experiments were conducted using five ML models - Random Forest, XGBoost, LightGBM, Multi-Layer Perceptron, and Long Short-Term Memory. Results demonstrated that FS plays an important role enhancing the computational efficiency of AI models without compromising detection accuracy.
arXiv Detail & Related papers (2024-11-11T15:01:55Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
MOSAIC: Multiple Observers Spotting AI Content, a Robust Approach to Machine-Generated Text Detection [35.67613230687864]
Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities. Various proposals have been made to automatically discriminate artificially generated from human-written texts. We derive a new, theoretically grounded approach to combine their respective strengths. Our experiments, using a variety of generator LLMs, suggest that our method effectively leads to robust detection performances.
arXiv Detail & Related papers (2024-09-11T20:55:12Z)
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation. CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details. Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)
A Preliminary Study on Using Large Language Models in Software Pentesting [2.0551676463612636]
Large language models (LLM) are perceived to offer promising potentials for automating security tasks. We investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code.
arXiv Detail & Related papers (2024-01-30T21:42:59Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers [14.018844722021896]
We study the specific patterns that characterize machine- and human-authored code. We propose DetectCodeGPT, a novel method for detecting machine-generated code.
arXiv Detail & Related papers (2024-01-12T09:15:20Z)
AI Content Self-Detection for Transformer-based Large Language Models [0.0]
This paper introduces the idea of direct origin detection and evaluates whether generative AI systems can recognize their output and distinguish it from human-written texts. Google's Bard model exhibits the largest capability of self-detection with an accuracy of 94%, followed by OpenAI's ChatGPT with 83%.
arXiv Detail & Related papers (2023-12-28T10:08:57Z)
Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications. The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use. We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
An Initial Look at Self-Reprogramming Artificial Intelligence [0.0]
We develop and experimentally validate the first fully self-reprogramming AI system. Applying AI-based computer code generation to AI itself, we implement an algorithm with the ability to continuously modify and rewrite its own neural network source code.
arXiv Detail & Related papers (2022-04-30T05:44:34Z)
Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.