Related papers: SHIELD: Thwarting Code Authorship Attribution

SHIELD: Thwarting Code Authorship Attribution

URL: http://arxiv.org/abs/2304.13255v1
Date: Wed, 26 Apr 2023 02:55:28 GMT
Title: SHIELD: Thwarting Code Authorship Attribution
Authors: Mohammed Abuhamad and Changhun Jung and David Mohaisen and DaeHun Nyang
Abstract summary: Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. We introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples.
Score: 11.311401613087742
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Authorship attribution has become increasingly accurate, posing a serious privacy risk for programmers who wish to remain anonymous. In this paper, we introduce SHIELD to examine the robustness of different code authorship attribution approaches against adversarial code examples. We define four attacks on attribution techniques, which include targeted and non-targeted attacks, and realize them using adversarial code perturbation. We experiment with a dataset of 200 programmers from the Google Code Jam competition to validate our methods targeting six state-of-the-art authorship attribution methods that adopt a variety of techniques for extracting authorship traits from source-code, including RNN, CNN, and code stylometry. Our experiments demonstrate the vulnerability of current authorship attribution methods against adversarial attacks. For the non-targeted attack, our experiments demonstrate the vulnerability of current authorship attribution methods against the attack with an attack success rate exceeds 98.5\% accompanied by a degradation of the identification confidence that exceeds 13\%. For the targeted attacks, we show the possibility of impersonating a programmer using targeted-adversarial perturbations with a success rate ranging from 66\% to 88\% for different authorship attribution techniques under several adversarial scenarios.

Related papers

Evaluate-and-Purify: Fortifying Code Language Models Against Adversarial Attacks Using LLM-as-a-Judge [3.1656947459658813]
We show that over 80% of adversarial examples generated by identifier substitution attackers are actually detectable. We propose EP-Shield, a unified framework for evaluating and purifying identifier substitution attacks.
arXiv Detail & Related papers (2025-04-28T12:28:55Z)
Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification [1.0168443186928038]
We evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics. We achieve maximum attack success rates of 92% and 78% for both obfuscation and impersonation attacks, respectively.
arXiv Detail & Related papers (2025-03-24T19:36:22Z)
SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources. Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker. Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z)
LeapFrog: The Rowhammer Instruction Skip Attack [5.285478567449658]
We present a new type of Rowhammer gadget, called a LeapFrog gadget, which allows an adversary to subvert code execution. The LeapFrog gadget manifests when the victim code stores the Program Counter (PC) value in the user or kernel stack. This research also presents a systematic process to identify LeapFrog gadgets.
arXiv Detail & Related papers (2024-04-11T16:10:16Z)
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification [24.501269108193412]
Adversarial examples have proven to threaten speaker identification systems. We propose a method to detect the presence of adversarial examples. We also introduce a method for identifying the victim model on which the adversarial attack is carried out.
arXiv Detail & Related papers (2024-02-29T17:06:52Z)
Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection [62.595450266262645]
This paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack. By embedding backdoors into models, attackers can deceive detectors into producing erroneous predictions for forged faces. We propose emphPoisoned Forgery Face framework, which enables clean-label backdoor attacks on face forgery detectors.
arXiv Detail & Related papers (2024-02-18T06:31:05Z)
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data. Recent attack methods can achieve a relatively high attack success rate (ASR) We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z)
PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT) Given an adversarial example, the objective of PRAT is to identify the attack used to generate it. We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z)
IDEA: Invariant Defense for Graph Adversarial Robustness [60.0126873387533]
We propose an Invariant causal DEfense method against adversarial Attacks (IDEA) We derive node-based and structure-based invariance objectives from an information-theoretic perspective. Experiments demonstrate that IDEA attains state-of-the-art defense performance under all five attacks on all five datasets.
arXiv Detail & Related papers (2023-05-25T07:16:00Z)
Preserving Semantics in Textual Adversarial Attacks [0.0]
Up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE) Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate.
arXiv Detail & Related papers (2022-11-08T12:40:07Z)
An Adversarial Attack Analysis on Malicious Advertisement URL Detection Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks. Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data. In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z)
Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z)
RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation [14.959517725033423]
Source code authorship attribution is an important problem often encountered in applications such as software forensics, bug fixing, and software quality analysis. Recent studies show that current source code authorship attribution methods can be compromised by attackers exploiting adversarial examples and coding style manipulation. We propose an innovative framework called Robust coding style Patterns Generation (RoPGen) RoPGen essentially learns authors' unique coding style patterns that are hard for attackers to manipulate or imitate.
arXiv Detail & Related papers (2022-02-12T11:27:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.