Exploring Model Dynamics for Accumulative Poisoning Discovery
- URL: http://arxiv.org/abs/2306.03726v1
- Date: Tue, 6 Jun 2023 14:45:24 GMT
- Title: Exploring Model Dynamics for Accumulative Poisoning Discovery
- Authors: Jianing Zhu, Xiawei Guo, Jiangchao Yao, Chao Du, Li He, Shuo Yuan,
Tongliang Liu, Liang Wang, Bo Han
- Abstract summary: We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
- Score: 62.08553134316483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial poisoning attacks pose huge threats to various machine learning
applications. Especially, the recent accumulative poisoning attacks show that
it is possible to achieve irreparable harm on models via a sequence of
imperceptible attacks followed by a trigger batch. Due to the limited
data-level discrepancy in real-time data streaming, current defensive methods
are indiscriminate in handling the poison and clean samples. In this paper, we
dive into the perspective of model dynamics and propose a novel information
measure, namely, Memorization Discrepancy, to explore the defense via the
model-level information. By implicitly transferring the changes in the data
manipulation to that in the model outputs, Memorization Discrepancy can
discover the imperceptible poison samples based on their distinct dynamics from
the clean samples. We thoroughly explore its properties and propose
Discrepancy-aware Sample Correction (DSC) to defend against accumulative
poisoning attacks. Extensive experiments comprehensively characterized
Memorization Discrepancy and verified its effectiveness. The code is publicly
available at: https://github.com/tmlr-group/Memorization-Discrepancy.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via
Diffusion Models [12.42597979026873]
We propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones.
Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy.
arXiv Detail & Related papers (2023-12-18T09:40:38Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Leveraging Diffusion-Based Image Variations for Robust Training on
Poisoned Data [26.551317580666353]
Backdoor attacks pose a serious security threat for training neural networks.
We propose a novel approach that enables model training on potentially poisoned datasets by utilizing the power of recent diffusion models.
arXiv Detail & Related papers (2023-10-10T07:25:06Z) - On the Exploitability of Instruction Tuning [103.8077787502381]
In this work, we investigate how an adversary can exploit instruction tuning to change a model's behavior.
We propose textitAutoPoison, an automated data poisoning pipeline.
Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data.
arXiv Detail & Related papers (2023-06-28T17:54:04Z) - Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning
Attacks [31.339252233416477]
We introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters.
We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models.
Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning.
arXiv Detail & Related papers (2023-03-07T01:55:26Z) - Temporal Robustness against Data Poisoning [69.01705108817785]
Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data.
We propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted.
arXiv Detail & Related papers (2023-02-07T18:59:19Z) - Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.