Related papers: Hindering Adversarial Attacks with Implicit Neural Representations

Hindering Adversarial Attacks with Implicit Neural Representations

URL: http://arxiv.org/abs/2210.13982v1
Date: Sat, 22 Oct 2022 13:10:24 GMT
Title: Hindering Adversarial Attacks with Implicit Neural Representations
Authors: Andrei A. Rusu, Dan A. Calian, Sven Gowal, Raia Hadsell
Abstract summary: Lossy Implicit Network Activation Coding (LINAC) defence successfully hinders several common adversarial attacks. We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category.
Score: 25.422201099331637
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce the Lossy Implicit Network Activation Coding (LINAC) defence, an input transformation which successfully hinders several common adversarial attacks on CIFAR-$10$ classifiers for perturbations up to $\epsilon = 8/255$ in $L_\infty$ norm and $\epsilon = 0.5$ in $L_2$ norm. Implicit neural representations are used to approximately encode pixel colour intensities in $2\text{D}$ images such that classifiers trained on transformed data appear to have robustness to small perturbations without adversarial training or large drops in performance. The seed of the random number generator used to initialise and train the implicit neural representation turns out to be necessary information for stronger generic attacks, suggesting its role as a private key. We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category. Interestingly, our LINAC defence also hinders some transfer and adaptive attacks, including our novel PBA strategy. Our results emphasise the importance of a broad range of customised attacks despite apparent robustness according to standard evaluations. LINAC source code and parameters of defended classifier evaluated throughout this submission are available: https://github.com/deepmind/linac

Related papers

Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class. Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z)
Robust width: A lightweight and certifiable adversarial defense [0.0]
Adversarial examples are intentionally constructed to cause the model to make incorrect predictions or classifications. In this work, we study an adversarial defense based on the robust width property (RWP), which was recently introduced for compressed sensing. We show that a specific input purification scheme based on the RWP gives theoretical robustness guarantees for images that are approximately sparse.
arXiv Detail & Related papers (2024-05-24T22:50:50Z)
DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks [4.734824660843964]
We introduce an encoding-based protection method against bit-flip attacks on neural networks, titled DeepNcode. Our results show an increase in protection margin of up to $7.6times$ for $4-$bit and $12.4times$ for $8-$bit quantized networks.
arXiv Detail & Related papers (2024-05-22T18:01:34Z)
Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack. We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios [22.22337220509128]
Backdoor attacks (BAs) are an emerging threat to deep neural network classifiers. We propose a detection framework based on BP reverse-engineering and a novel it expected transferability (ET) statistic.
arXiv Detail & Related papers (2022-01-20T22:21:38Z)
PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations. Many defence methods have been proposed that attempt to improve robustness to adversarial noise. evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z)
Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Our goal is to misclassify a specific sample into a target class without any sample modification. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)
Patch-wise++ Perturbation for Adversarial Targeted Attacks [132.58673733817838]
We propose a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions. Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 35.9% for defense models and 32.7% for normally trained models.
arXiv Detail & Related papers (2020-12-31T08:40:42Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)
Are L2 adversarial examples intrinsically different? [14.77179227968466]
We unravel the properties that can intrinsically differentiate adversarial examples and normal inputs through theoretical analysis. We achieve a recovered classification accuracy of up to 99% on MNIST, 89% on CIFAR, and 87% on ImageNet subsets against $L$ attacks.
arXiv Detail & Related papers (2020-02-28T03:42:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.