Mossad: Defeating Software Plagiarism Detection
- URL: http://arxiv.org/abs/2010.01700v1
- Date: Sun, 4 Oct 2020 22:02:38 GMT
- Title: Mossad: Defeating Software Plagiarism Detection
- Authors: Breanna Devore-McDonald and Emery D. Berger
- Abstract summary: This paper presents an entirely automatic program transformation approach, Mossad, that defeats popular software plagiarism detection tools.
It comprises a framework that couples techniques inspired by genetic programming with domain-specific knowledge to effectively undermine plagiarism detectors.
Moss is both fast and effective: it can, in minutes, generate modified versions of programs that are likely to escape detection.
- Score: 0.48225981108928456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic software plagiarism detection tools are widely used in educational
settings to ensure that submitted work was not copied. These tools have grown
in use together with the rise in enrollments in computer science programs and
the widespread availability of code on-line. Educators rely on the robustness
of plagiarism detection tools; the working assumption is that the effort
required to evade detection is as high as that required to actually do the
assigned work.
This paper shows this is not the case. It presents an entirely automatic
program transformation approach, Mossad, that defeats popular software
plagiarism detection tools. Mossad comprises a framework that couples
techniques inspired by genetic programming with domain-specific knowledge to
effectively undermine plagiarism detectors. Mossad is effective at defeating
four plagiarism detectors, including Moss and JPlag. Mossad is both fast and
effective: it can, in minutes, generate modified versions of programs that are
likely to escape detection. More insidiously, because of its non-deterministic
approach, Mossad can, from a single program, generate dozens of variants, which
are classified as no more suspicious than legitimate assignments. A detailed
study of Mossad across a corpus of real student assignments demonstrates its
efficacy at evading detection. A user study shows that graduate student
assistants consistently rate Mossad-generated code as just as readable as
authentic student code. This work motivates the need for both research on more
robust plagiarism detection tools and greater integration of naturally
plagiarism-resistant methodologies like code review into computer science
education.
Related papers
- On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents.
We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one.
We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - Discovering and exploring cases of educational source code plagiarism
with Dolos [0.0]
Dolos is an ecosystem of tools for detecting and preventing plagiarism in educational source code.
Educators can now run the entire plagiarism pipeline from a new web app in their browser.
New dashboards provide an instant assessment of whether a collection of source files contains suspected cases of plagiarism.
arXiv Detail & Related papers (2024-02-16T17:47:11Z) - A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023.
We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance.
This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z) - CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks.
We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning.
In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Neural Language Models are Effective Plagiarists [38.85940137464184]
We find that a student using GPT-J can complete introductory level programming assignments without triggering suspicion from MOSS.
GPT-J was not trained on the problems in question and is not provided with any examples to work from.
We conclude that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code.
arXiv Detail & Related papers (2022-01-19T04:00:46Z) - A Survey of Plagiarism Detection Systems: Case of Use with English,
French and Arabic Languages [0.0]
This paper presents an overview of plagiarism detection systems for use in Arabic, French, and English academic and educational settings.
An indepth examination of technical forms of plagiarism was also performed in the context of this study.
arXiv Detail & Related papers (2022-01-10T16:11:54Z) - Hamtajoo: A Persian Plagiarism Checker for Academic Manuscripts [0.0]
Hamtajoo is a Persian plagiarism detection system for academic manuscripts.
We describe the overall structure of the system along with the algorithms used in each stage.
In order to evaluate the performance of the proposed system, we used a plagiarism detection corpus comply with the PAN standards.
arXiv Detail & Related papers (2021-12-27T15:45:35Z) - Defending against Model Stealing via Verifying Embedded External
Features [90.29429679125508]
adversaries can steal' deployed models even when they have no training samples and can not get access to the model parameters or structures.
We explore the defense from another angle by verifying whether a suspicious model contains the knowledge of defender-specified emphexternal features.
Our method is effective in detecting different types of model stealing simultaneously, even if the stolen model is obtained via a multi-stage stealing process.
arXiv Detail & Related papers (2021-12-07T03:51:54Z) - The Struggle with Academic Plagiarism: Approaches based on Semantic
Similarity [0.0]
We present a report of how semantic similarity measures can be used in the plagiarism detection task.
Current software has proven to be successful, however the problem of identifying paraphrasing or obfuscation plagiarism remains unresolved.
arXiv Detail & Related papers (2021-06-02T20:00:33Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.