Related papers: Mossad: Defeating Software Plagiarism Detection

Mossad: Defeating Software Plagiarism Detection

URL: http://arxiv.org/abs/2010.01700v1
Date: Sun, 4 Oct 2020 22:02:38 GMT
Title: Mossad: Defeating Software Plagiarism Detection
Authors: Breanna Devore-McDonald and Emery D. Berger
Abstract summary: This paper presents an entirely automatic program transformation approach, Mossad, that defeats popular software plagiarism detection tools. It comprises a framework that couples techniques inspired by genetic programming with domain-specific knowledge to effectively undermine plagiarism detectors. Moss is both fast and effective: it can, in minutes, generate modified versions of programs that are likely to escape detection.
Score: 0.48225981108928456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic software plagiarism detection tools are widely used in educational settings to ensure that submitted work was not copied. These tools have grown in use together with the rise in enrollments in computer science programs and the widespread availability of code on-line. Educators rely on the robustness of plagiarism detection tools; the working assumption is that the effort required to evade detection is as high as that required to actually do the assigned work. This paper shows this is not the case. It presents an entirely automatic program transformation approach, Mossad, that defeats popular software plagiarism detection tools. Mossad comprises a framework that couples techniques inspired by genetic programming with domain-specific knowledge to effectively undermine plagiarism detectors. Mossad is effective at defeating four plagiarism detectors, including Moss and JPlag. Mossad is both fast and effective: it can, in minutes, generate modified versions of programs that are likely to escape detection. More insidiously, because of its non-deterministic approach, Mossad can, from a single program, generate dozens of variants, which are classified as no more suspicious than legitimate assignments. A detailed study of Mossad across a corpus of real student assignments demonstrates its efficacy at evading detection. A user study shows that graduate student assistants consistently rate Mossad-generated code as just as readable as authentic student code. This work motivates the need for both research on more robust plagiarism detection tools and greater integration of naturally plagiarism-resistant methodologies like code review into computer science education.

Related papers

On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents. We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one. We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
Discovering and exploring cases of educational source code plagiarism with Dolos [0.0]
Dolos is an ecosystem of tools for detecting and preventing plagiarism in educational source code. Educators can now run the entire plagiarism pipeline from a new web app in their browser. New dashboards provide an instant assessment of whether a collection of source files contains suspected cases of plagiarism.
arXiv Detail & Related papers (2024-02-16T17:47:11Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
Neural Language Models are Effective Plagiarists [38.85940137464184]
We find that a student using GPT-J can complete introductory level programming assignments without triggering suspicion from MOSS. GPT-J was not trained on the problems in question and is not provided with any examples to work from. We conclude that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code.
arXiv Detail & Related papers (2022-01-19T04:00:46Z)
A Survey of Plagiarism Detection Systems: Case of Use with English, French and Arabic Languages [0.0]
This paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings. An indepth examination of technical forms of plagiarism was also performed in the context of this study.
arXiv Detail & Related papers (2022-01-10T16:11:54Z)
Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts [0.0]
Hamtajoo is a Persian plagiarism detection system for academic manuscripts. We describe the overall structure of the system along with the algorithms used in each stage. In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.
arXiv Detail & Related papers (2021-12-27T15:45:35Z)
Defending against Model Stealing via Verifying Embedded External Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures. We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features. Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z)
The Struggle with Academic Plagiarism: Approaches based on Semantic Similarity [0.0]
We present a report of how semantic similarity measures can be used in the plagiarism detection task. Current software has proven to be successful, however the problem of identifying paraphrasing or obfuscation plagiarism remains unresolved.
arXiv Detail & Related papers (2021-06-02T20:00:33Z)
MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks. The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z)
Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes. We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.