Related papers: Evaluating Software Plagiarism Detection in the Age of AI: Automated Obfuscation and Lessons for Academic Integrity

Evaluating Software Plagiarism Detection in the Age of AI: Automated Obfuscation and Lessons for Academic Integrity

URL: http://arxiv.org/abs/2505.20158v1
Date: Mon, 26 May 2025 15:59:01 GMT
Title: Evaluating Software Plagiarism Detection in the Age of AI: Automated Obfuscation and Lessons for Academic Integrity
Authors: Timur Sağlam, Larissa Schmid,
Abstract summary: Plagiarism in programming assignments is a persistent issue in computer science education.<n>Software plagiarism detectors are widely used to identify suspicious similarities at scale.<n>They are vulnerable to advanced obfuscation based on structural modification of program code.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Plagiarism in programming assignments is a persistent issue in computer science education, increasingly complicated by the emergence of automated obfuscation attacks. While software plagiarism detectors are widely used to identify suspicious similarities at scale and are resilient to simple obfuscation techniques, they are vulnerable to advanced obfuscation based on structural modification of program code that preserves the original program behavior. While different defense mechanisms have been proposed to increase resilience against these attacks, their current evaluation is limited to the scope of attacks used and lacks a comprehensive investigation regarding AI-based obfuscation. In this paper, we investigate the resilience of these defense mechanisms against a broad range of automated obfuscation attacks, including both algorithmic and AI-generated methods, and for a wide variety of real-world datasets. We evaluate the improvements of two defense mechanisms over the plagiarism detector JPlag across over four million pairwise program comparisons. Our results show significant improvements in detecting obfuscated plagiarism instances, and we observe an improved detection of AI-generated programs, even though the defense mechanisms are not designed for this use case. Based on our findings, we provide an in-depth discussion of their broader implications for academic integrity and the role of AI in education.

Related papers

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors [65.27124213266491]
We propose textbfContrastive textbfParaphrase textbfAttack (CoPA), a training-free method that effectively deceives text detectors.<n>CoPA constructs an auxiliary machine-like word distribution as a contrast to the human-like distribution generated by large language models.<n>Our theoretical analysis suggests the superiority of the proposed attack.
arXiv Detail & Related papers (2025-05-21T10:08:39Z)
The Failure of Plagiarism Detection in Competitive Programming [0.0]
Plagiarism in programming courses remains a persistent challenge.<n>This paper examines why traditional code plagiarism detection methods frequently fail in competitive programming contexts.<n>We find that widely-used automated similarity checkers can be thwarted by simple code transformations or novel AI-generated code.
arXiv Detail & Related papers (2025-05-13T05:43:49Z)
The Code Barrier: What LLMs Actually Understand? [7.407441962359689]
This research uses code obfuscation as a structured testing framework to evaluate semantic understanding capabilities of language models.<n>Findings show a statistically significant performance decline as obfuscation complexity increases.<n>This research introduces a new evaluation approach for assessing code comprehension in language models.
arXiv Detail & Related papers (2025-04-14T14:11:26Z)
Hierarchical Manifold Projection for Ransomware Detection: A Novel Geometric Approach to Identifying Malicious Encryption Patterns [0.0]
Encryption-based cyber threats continue to evolve, employing increasingly sophisticated techniques to bypass traditional detection mechanisms.<n>A novel classification framework structured through hierarchical manifold projection introduces a mathematical approach to detecting malicious encryption.<n>The proposed methodology transforms encryption sequences into structured manifold embeddings, ensuring classification robustness through non-Euclidean feature separability.
arXiv Detail & Related papers (2025-02-11T23:20:58Z)
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions [51.51850981481236]
We introduce POATE, a novel jailbreak technique that harnesses contrastive reasoning to provoke unethical responses.<n>PoATE crafts semantically opposing intents and integrates them with adversarial templates, steering models toward harmful outputs with remarkable subtlety.<n>To counter this, we propose Intent-Aware CoT and Reverse Thinking CoT, which decompose queries to detect malicious intent and reason in reverse to evaluate and reject harmful responses.
arXiv Detail & Related papers (2025-01-03T15:40:03Z)
Automated Hardware Logic Obfuscation Framework Using GPT [3.1789948141373077]
We introduce Obfus-chat, a novel framework leveraging Generative Pre-trained Transformer (GPT) models to automate the obfuscation process. The proposed framework accepts hardware design netlists and key sizes as inputs, and autonomously generates obfuscated code tailored to enhance security.
arXiv Detail & Related papers (2024-05-20T17:33:00Z)
Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack [24.954755569786396]
We propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content.
arXiv Detail & Related papers (2024-04-02T12:49:22Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Attack Agnostic Adversarial Defense via Visual Imperceptible Bound [70.72413095698961]
This research aims to design a defense model that is robust within a certain bound against both seen and unseen adversarial attacks. The proposed defense model is evaluated on the MNIST, CIFAR-10, and Tiny ImageNet databases. The proposed algorithm is attack agnostic, i.e. it does not require any knowledge of the attack algorithm.
arXiv Detail & Related papers (2020-10-25T23:14:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.