Depth-2 Neural Networks Under a Data-Poisoning Attack
- URL: http://arxiv.org/abs/2005.01699v3
- Date: Wed, 29 Jun 2022 17:31:39 GMT
- Title: Depth-2 Neural Networks Under a Data-Poisoning Attack
- Authors: Sayar Karmakar, Anirbit Mukherjee and Theodore Papamarkou
- Abstract summary: We study the possibility of defending against data-poisoning attacks while training a shallow neural network in a regression setup.
In this work, we focus on doing supervised learning for a class of depth-2 finite-width neural networks.
- Score: 2.105564340986074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study the possibility of defending against data-poisoning
attacks while training a shallow neural network in a regression setup. We focus
on doing supervised learning for a class of depth-2 finite-width neural
networks, which includes single-filter convolutional networks. In this class of
networks, we attempt to learn the network weights in the presence of a
malicious oracle doing stochastic, bounded and additive adversarial distortions
on the true output during training. For the non-gradient stochastic algorithm
that we construct, we prove worst-case near-optimal trade-offs among the
magnitude of the adversarial attack, the weight approximation accuracy, and the
confidence achieved by the proposed algorithm. As our algorithm uses
mini-batching, we analyze how the mini-batch size affects convergence. We also
show how to utilize the scaling of the outer layer weights to counter
output-poisoning attacks depending on the probability of attack. Lastly, we
give experimental evidence demonstrating how our algorithm outperforms
stochastic gradient descent under different input data distributions, including
instances of heavy-tailed distributions.
Related papers
- On Neural Network approximation of ideal adversarial attack and
convergence of adversarial training [3.553493344868414]
Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model.
In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further computation.
arXiv Detail & Related papers (2023-07-30T01:04:36Z) - Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - On the Robustness of Bayesian Neural Networks to Adversarial Attacks [11.277163381331137]
Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications.
We show that vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution.
We prove that the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each neural network sampled from the posterior is vulnerable to gradient-based attacks.
arXiv Detail & Related papers (2022-07-13T12:27:38Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection.
We propose a novel defense method that consists of "truncation" and "adrial training"
Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.