Robust and Accurate Authorship Attribution via Program Normalization
- URL: http://arxiv.org/abs/2007.00772v3
- Date: Fri, 25 Feb 2022 20:45:35 GMT
- Title: Robust and Accurate Authorship Attribution via Program Normalization
- Authors: Yizhen Wang, Mohannad Alhanahnah, Ke Wang, Mihai Christodorescu,
Somesh Jha
- Abstract summary: Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning.
In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author.
We present a novel learning framework, $textitnormalize-and-predict$ ($textitN&P$), that in theory guarantees the robustness of any authorship-attribution approach.
- Score: 24.381734600088453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Source code attribution approaches have achieved remarkable accuracy thanks
to the rapid advances in deep learning. However, recent studies shed light on
their vulnerability to adversarial attacks. In particular, they can be easily
deceived by adversaries who attempt to either create a forgery of another
author or to mask the original author. To address these emerging issues, we
formulate this security challenge into a general threat model, the
$\textit{relational adversary}$, that allows an arbitrary number of the
semantics-preserving transformations to be applied to an input in any problem
space. Our theoretical investigation shows the conditions for robustness and
the trade-off between robustness and accuracy in depth. Motivated by these
insights, we present a novel learning framework,
$\textit{normalize-and-predict}$ ($\textit{N&P}$), that in theory guarantees
the robustness of any authorship-attribution approach. We conduct an extensive
evaluation of $\textit{N&P}$ in defending two of the latest
authorship-attribution approaches against state-of-the-art attack methods. Our
evaluation demonstrates that $\textit{N&P}$ improves the accuracy on
adversarial inputs by as much as 70% over the vanilla models. More importantly,
$\textit{N&P}$ also increases robust accuracy to 45% higher than adversarial
training while running over 40 times faster.
Related papers
- CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models [12.386141652094999]
Existing certified robustness based on random smoothing has shown considerable promise in certifying the input-specific text perturbations.
A naive method is to simply increase the masking ratio and the likelihood of masking attack tokens, but it leads to a significant reduction in both certified accuracy and the certified radius.
We introduce a novel approach, designed to identify a superior prompt that maintains higher certified accuracy under extensive masking.
arXiv Detail & Related papers (2024-06-04T01:02:22Z) - Enhancing Adversarial Training via Reweighting Optimization Trajectory [72.75558017802788]
A number of approaches have been proposed to address drawbacks such as extra regularization, adversarial weights, and training with more data.
We propose a new method named textbfWeighted Optimization Trajectories (WOT) that leverages the optimization trajectories of adversarial training in time.
Our results show that WOT integrates seamlessly with the existing adversarial training methods and consistently overcomes the robust overfitting issue.
arXiv Detail & Related papers (2023-06-25T15:53:31Z) - Raising the Bar for Certified Adversarial Robustness with Diffusion
Models [9.684141378657522]
In this work, we demonstrate that a similar approach can substantially improve deterministic certified defenses.
One of our main insights is that the difference between the training and test accuracy of the original model, is a good predictor of the magnitude of the improvement.
Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the $ell$ ($epsilon = 36/255$) and $ell_infty$ ($epsilon = 8/255$) threat models.
arXiv Detail & Related papers (2023-05-17T17:29:10Z) - Revisiting DeepFool: generalization and improvement [16.554225382392993]
We introduce a new family of adversarial attacks that strike a balance between effectiveness and computational efficiency.
Our proposed attacks are also suitable for evaluating the robustness of large models and can be used to perform adversarial training.
arXiv Detail & Related papers (2023-03-22T11:49:35Z) - ADDMU: Detection of Far-Boundary Adversarial Examples with Data and
Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks.
We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection.
Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z) - Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack [96.50202709922698]
A practical evaluation method should be convenient (i.e., parameter-free), efficient (i.e., fewer iterations) and reliable.
We propose a parameter-free Adaptive Auto Attack (A$3$) evaluation method which addresses the efficiency and reliability in a test-time-training fashion.
arXiv Detail & Related papers (2022-03-10T04:53:54Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Robustness, Privacy, and Generalization of Adversarial Training [84.38148845727446]
This paper establishes and quantifies the privacy-robustness trade-off and generalization-robustness trade-off in adversarial training.
We show that adversarial training is $(varepsilon, delta)$-differentially private, where the magnitude of the differential privacy has a positive correlation with the robustified intensity.
Our generalization bounds do not explicitly rely on the parameter size which would be large in deep learning.
arXiv Detail & Related papers (2020-12-25T13:35:02Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - Reliable evaluation of adversarial robustness with an ensemble of
diverse parameter-free attacks [65.20660287833537]
In this paper we propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function.
We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness.
arXiv Detail & Related papers (2020-03-03T18:15:55Z) - Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis.
We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.