Hindering Adversarial Attacks with Implicit Neural Representations
- URL: http://arxiv.org/abs/2210.13982v1
- Date: Sat, 22 Oct 2022 13:10:24 GMT
- Title: Hindering Adversarial Attacks with Implicit Neural Representations
- Authors: Andrei A. Rusu, Dan A. Calian, Sven Gowal, Raia Hadsell
- Abstract summary: Lossy Implicit Network Activation Coding (LINAC) defence successfully hinders several common adversarial attacks.
We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category.
- Score: 25.422201099331637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an
input transformation which successfully hinders several common adversarial
attacks on CIFAR-$10$ classifiers for perturbations up to $\epsilon = 8/255$ in
$L_\infty$ norm and $\epsilon = 0.5$ in $L_2$ norm. Implicit neural
representations are used to approximately encode pixel colour intensities in
$2\text{D}$ images such that classifiers trained on transformed data appear to
have robustness to small perturbations without adversarial training or large
drops in performance. The seed of the random number generator used to
initialise and train the implicit neural representation turns out to be
necessary information for stronger generic attacks, suggesting its role as a
private key. We devise a Parametric Bypass Approximation (PBA) attack strategy
for key-based defences, which successfully invalidates an existing method in
this category. Interestingly, our LINAC defence also hinders some transfer and
adaptive attacks, including our novel PBA strategy. Our results emphasise the
importance of a broad range of customised attacks despite apparent robustness
according to standard evaluations. LINAC source code and parameters of defended
classifier evaluated throughout this submission are available:
https://github.com/deepmind/linac
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - Robust width: A lightweight and certifiable adversarial defense [0.0]
Adversarial examples are intentionally constructed to cause the model to make incorrect predictions or classifications.
In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing.
We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse.
arXiv Detail & Related papers (2024-05-24T22:50:50Z) - DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks [4.734824660843964]
We introduce an encoding-based protection method against bit-flip attacks on neural networks, titled DeepNcode.
Our results show an increase in protection margin of up to $7.6times$ for $4-$bit and $12.4times$ for $8-$bit quantized networks.
arXiv Detail & Related papers (2024-05-22T18:01:34Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Post-Training Detection of Backdoor Attacks for Two-Class and
Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers.
We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z) - PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations.
Many defence methods have been proposed that attempt to improve robustness to adversarial noise.
evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Patch-wise++ Perturbation for Adversarial Targeted Attacks [132.58673733817838]
We propose a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability.
Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions.
Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 35.9% for defense models and 32.7% for normally trained models.
arXiv Detail & Related papers (2020-12-31T08:40:42Z) - Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models.
In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms.
CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z) - Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis.
We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.