Related papers: Robust and Accurate Authorship Attribution via Program Normalization

Robust and Accurate Authorship Attribution via Program Normalization

URL: http://arxiv.org/abs/2007.00772v3
Date: Fri, 25 Feb 2022 20:45:35 GMT
Title: Robust and Accurate Authorship Attribution via Program Normalization
Authors: Yizhen Wang, Mohannad Alhanahnah, Ke Wang, Mihai Christodorescu, Somesh Jha
Abstract summary: Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning. In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author. We present a novel learning framework, $textitnormalize-and-predict$ ($textitN&P$), that in theory guarantees the robustness of any authorship-attribution approach.
Score: 24.381734600088453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning. However, recent studies shed light on their vulnerability to adversarial attacks. In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author. To address these emerging issues, we formulate this security challenge into a general threat model, the $\textit{relational adversary}$, that allows an arbitrary number of the semantics-preserving transformations to be applied to an input in any problem space. Our theoretical investigation shows the conditions for robustness and the trade-off between robustness and accuracy in depth. Motivated by these insights, we present a novel learning framework, $\textit{normalize-and-predict}$ ($\textit{N&P}$), that in theory guarantees the robustness of any authorship-attribution approach. We conduct an extensive evaluation of $\textit{N&P}$ in defending two of the latest authorship-attribution approaches against state-of-the-art attack methods. Our evaluation demonstrates that $\textit{N&P}$ improves the accuracy on adversarial inputs by as much as 70% over the vanilla models. More importantly, $\textit{N&P}$ also increases robust accuracy to 45% higher than adversarial training while running over 40 times faster.

Related papers

CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models [12.386141652094999]
Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations. A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius. We introduce a novel approach, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking.
arXiv Detail & Related papers (2024-06-04T01:02:22Z)
Revisiting DeepFool: generalization and improvement [16.554225382392993]
We introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency. Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training.
arXiv Detail & Related papers (2023-03-22T11:49:35Z)
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks. We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z)
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack [96.50202709922698]
A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable. We propose a parameter-free Adaptive Auto Attack (A$3$) evaluation method which addresses the efficiency and reliability in a test-time-training fashion.
arXiv Detail & Related papers (2022-03-10T04:53:54Z)
Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data. In this paper, we propose variable-length textual adversarial attacks(VL-Attack) Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z)
Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases. We first rigorously analyze a deterministic variant of the online threat model. We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z)
RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks. We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)
Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z)
Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.