Regularisation Can Mitigate Poisoning Attacks: A Novel Analysis Based on
Multiobjective Bilevel Optimisation
- URL: http://arxiv.org/abs/2003.00040v2
- Date: Sat, 20 Jun 2020 13:44:48 GMT
- Title: Regularisation Can Mitigate Poisoning Attacks: A Novel Analysis Based on
Multiobjective Bilevel Optimisation
- Authors: Javier Carnerero-Cano, Luis Mu\~noz-Gonz\'alez, Phillippa Spencer and
Emil C. Lupu
- Abstract summary: Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance.
Optimal poisoning attacks, which can be formulated as bilevel problems, help to assess the robustness of learning algorithms in worst-case scenarios.
We show that this approach leads to an overly pessimistic view of the robustness of the algorithms.
- Score: 3.3181276611945263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a
fraction of the training data is manipulated to deliberately degrade the
algorithms' performance. Optimal poisoning attacks, which can be formulated as
bilevel optimisation problems, help to assess the robustness of learning
algorithms in worst-case scenarios. However, current attacks against algorithms
with hyperparameters typically assume that these hyperparameters remain
constant ignoring the effect the attack has on them. We show that this approach
leads to an overly pessimistic view of the robustness of the algorithms. We
propose a novel optimal attack formulation that considers the effect of the
attack on the hyperparameters by modelling the attack as a multiobjective
bilevel optimisation problem. We apply this novel attack formulation to ML
classifiers using $L_2$ regularisation and show that, in contrast to results
previously reported, $L_2$ regularisation enhances the stability of the
learning algorithms and helps to mitigate the attacks. Our empirical evaluation
on different datasets confirms the limitations of previous strategies,
evidences the benefits of using $L_2$ regularisation to dampen the effect of
poisoning attacks and shows how the regularisation hyperparameter increases
with the fraction of poisoning points.
Related papers
- Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation [49.480978190805125]
Transfer attacks generate significant interest for black-box applications.
Existing works essentially directly optimize the single-level objective w.r.t. surrogate model.
We propose a bilevel optimization paradigm, which explicitly reforms the nested relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker.
arXiv Detail & Related papers (2024-06-04T07:45:27Z) - Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails.
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses.
C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z) - Hyperparameter Learning under Data Poisoning: Analysis of the Influence
of Regularization via Multiobjective Bilevel Optimization [3.3181276611945263]
Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance.
Optimal attacks can be formulated as bilevel optimization problems and help to assess their robustness in worst-case scenarios.
arXiv Detail & Related papers (2023-06-02T15:21:05Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial
Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically.
Our method learns the in adversarial attacks parameterized by a recurrent neural network.
We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z) - Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels.
Recent efforts combine it with another l_infty perturbation on magnitudes.
We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z) - Regularization Can Help Mitigate Poisoning Attacks... with the Right
Hyperparameters [1.8570591025615453]
Machine learning algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to degrade the algorithms' performance.
We show that current approaches, which typically assume that regularization hyper parameters remain constant, lead to an overly pessimistic view of the algorithms' robustness.
We propose a novel optimal attack formulation that considers the effect of the attack on the hyper parameters, modelling the attack as a emphminimax bilevel optimization problem.
arXiv Detail & Related papers (2021-05-23T14:34:47Z) - Adversarial examples attack based on random warm restart mechanism and
improved Nesterov momentum [0.0]
Some studies have pointed out that the deep learning model is vulnerable to attacks adversarial examples and makes false decisions.
We propose RWR-NM-PGD attack algorithm based on random warm restart mechanism and improved Nesterov momentum.
Our method has average attack success rate of 46.3077%, which is 27.19% higher than I-FGSM and 9.27% higher than PGD.
arXiv Detail & Related papers (2021-05-10T07:24:25Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.