Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits
- URL: http://arxiv.org/abs/2102.10496v1
- Date: Sun, 21 Feb 2021 03:13:27 GMT
- Title: Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits
- Authors: Jiawang Bai, Baoyuan Wu, Yong Zhang, Yiming Li, Zhifeng Li, Shu-Tao
Xia
- Abstract summary: We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
- Score: 55.740716446995805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To explore the vulnerability of deep neural networks (DNNs), many attack
paradigms have been well studied, such as the poisoning-based backdoor attack
in the training stage and the adversarial attack in the inference stage. In
this paper, we study a novel attack paradigm, which modifies model parameters
in the deployment stage for malicious purposes. Specifically, our goal is to
misclassify a specific sample into a target class without any sample
modification, while not significantly reduce the prediction accuracy of other
samples to ensure the stealthiness. To this end, we formulate this problem as a
binary integer programming (BIP), since the parameters are stored as binary
bits ($i.e.$, 0 and 1) in the memory. By utilizing the latest technique in
integer programming, we equivalently reformulate this BIP problem as a
continuous optimization problem, which can be effectively and efficiently
solved using the alternating direction method of multipliers (ADMM) method.
Consequently, the flipped critical bits can be easily determined through
optimization, rather than using a heuristic strategy. Extensive experiments
demonstrate the superiority of our method in attacking DNNs.
Related papers
- Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Meta Adversarial Perturbations [66.43754467275967]
We show the existence of a meta adversarial perturbation (MAP)
MAP causes natural images to be misclassified with high probability after being updated through only a one-step gradient ascent update.
We show that these perturbations are not only image-agnostic, but also model-agnostic, as a single perturbation generalizes well across unseen data points and different neural network architectures.
arXiv Detail & Related papers (2021-11-19T16:01:45Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - Improving Transformation-based Defenses against Adversarial Examples
with First-order Perturbations [16.346349209014182]
Studies show that neural networks are susceptible to adversarial attacks.
This exposes a potential threat to neural network-based intelligent systems.
We propose a method for counteracting adversarial perturbations to improve adversarial robustness.
arXiv Detail & Related papers (2021-03-08T06:27:24Z) - Online Adversarial Attacks [57.448101834579624]
We formalize the online adversarial attack problem, emphasizing two key elements found in real-world use-cases.
We first rigorously analyze a deterministic variant of the online threat model.
We then propose algoname, a simple yet practical algorithm yielding a provably better competitive ratio for $k=2$ over the current best single threshold algorithm.
arXiv Detail & Related papers (2021-03-02T20:36:04Z) - Regularisation Can Mitigate Poisoning Attacks: A Novel Analysis Based on
Multiobjective Bilevel Optimisation [3.3181276611945263]
Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance.
Optimal poisoning attacks, which can be formulated as bilevel problems, help to assess the robustness of learning algorithms in worst-case scenarios.
We show that this approach leads to an overly pessimistic view of the robustness of the algorithms.
arXiv Detail & Related papers (2020-02-28T19:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.