Related papers: Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization

Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization

URL: http://arxiv.org/abs/2411.03752v1
Date: Wed, 06 Nov 2024 08:27:49 GMT
Title: Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization
Authors: Yuhao He, Jinyu Tian, Xianwei Zheng, Li Dong, Yuanman Li, Leo Yu Zhang, Jiantao Zhou,
Abstract summary: We introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack. This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise. We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks.
Score: 39.37308843208039
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent studies have shown that deep learning models are very vulnerable to poisoning attacks. Many defense methods have been proposed to address this issue. However, traditional poisoning attacks are not as threatening as commonly believed. This is because they often cause differences in how the model performs on the training set compared to the validation set. Such inconsistency can alert defenders that their data has been poisoned, allowing them to take the necessary defensive actions. In this paper, we introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack. This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise. We achieve this by ensuring the poisoned model's loss function has a similar value as a normally trained model at each input sample but with a large local curvature. A similar model loss ensures that there is no obvious inconsistency between the training and validation accuracy, demonstrating high stealthiness. On the other hand, the large curvature implies that a small perturbation may cause a significant increase in model loss, leading to substantial performance degradation, which reflects a worse robustness. We fulfill this purpose by making the model have singular Hessian information at the optimal point via our proposed Singularization Regularization term. We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks. Furthermore, we have confirmed the hazards of this form of poisoning attack under more general scenarios using natural noise, offering a new perspective for research in the field of security.

Related papers

Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification [6.816788256267754]
We show that an adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error.<n>For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training.
arXiv Detail & Related papers (2025-08-07T17:41:33Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks [20.55681622921858]
Model poisoning attacks greatly jeopardize the application of federated learning (FL) In this work, we propose a novel proactive defense named RECESS against model poisoning attacks. Unlike previous methods that score each iteration, RECESS considers clients' performance correlation across multiple iterations to estimate the trust score.
arXiv Detail & Related papers (2023-10-09T06:09:01Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
Sharpness-Aware Data Poisoning Attack [38.01535347191942]
Recent research has highlighted the vulnerability of Deep Neural Networks (DNNs) against data poisoning attacks. We propose a novel attack method called ''Sharpness-Aware Data Poisoning Attack (SAPA)'' In particular, it leverages the concept of DNNs' loss landscape sharpness to optimize the poisoning effect on the worst re-trained model.
arXiv Detail & Related papers (2023-05-24T08:00:21Z)
Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks [33.82164201455115]
Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning. Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack. We argue that this approach overlooks a crucial trade-off: Attackers can increase at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning)
arXiv Detail & Related papers (2023-05-07T15:58:06Z)
Indiscriminate Poisoning Attacks Are Shortcuts [77.38947817228656]
We find that the perturbations of advanced poisoning attacks are almost textbflinear separable when assigned with the target labels of the corresponding samples. We show that such synthetic perturbations are as powerful as the deliberately crafted attacks. Our finding suggests that the emphshortcut learning problem is more serious than previously believed.
arXiv Detail & Related papers (2021-11-01T12:44:26Z)
Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects. Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data. We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label" We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
Model-Targeted Poisoning Attacks with Provable Convergence [19.196295769662186]
In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model. We consider poisoning attacks against convex machine learning models and propose an efficient poisoning attack designed to induce a specified model.
arXiv Detail & Related papers (2020-06-30T01:56:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.