Incompatibility Clustering as a Defense Against Backdoor Poisoning
Attacks
- URL: http://arxiv.org/abs/2105.03692v4
- Date: Thu, 27 Apr 2023 04:38:45 GMT
- Title: Incompatibility Clustering as a Defense Against Backdoor Poisoning
Attacks
- Authors: Charles Jin, Melinda Sun, Martin Rinard
- Abstract summary: We propose a novel clustering mechanism based on an incompatibility property between subsets of data that emerges during model training.
This mechanism partitions the dataset into subsets that generalize only to themselves, i.e., training on one subset does not improve performance on the other subsets.
We apply our clustering mechanism to defend against data poisoning attacks, in which the attacker injects malicious data into the training dataset to affect the trained model's output.
- Score: 4.988182188764627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel clustering mechanism based on an incompatibility property
between subsets of data that emerges during model training. This mechanism
partitions the dataset into subsets that generalize only to themselves, i.e.,
training on one subset does not improve performance on the other subsets.
Leveraging the interaction between the dataset and the training process, our
clustering mechanism partitions datasets into clusters that are defined by--and
therefore meaningful to--the objective of the training process.
We apply our clustering mechanism to defend against data poisoning attacks,
in which the attacker injects malicious poisoned data into the training dataset
to affect the trained model's output. Our evaluation focuses on backdoor
attacks against deep neural networks trained to perform image classification
using the GTSRB and CIFAR-10 datasets. Our results show that (1) these attacks
produce poisoned datasets in which the poisoned and clean data are incompatible
and (2) our technique successfully identifies (and removes) the poisoned data.
In an end-to-end evaluation, our defense reduces the attack success rate to
below 1% on 134 out of 165 scenarios, with only a 2% drop in clean accuracy on
CIFAR-10 and a negligible drop in clean accuracy on GTSRB.
Related papers
- Variance-Based Defense Against Blended Backdoor Attacks [0.0]
Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models.<n>We propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger.
arXiv Detail & Related papers (2025-06-02T09:01:35Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Towards Attack-tolerant Federated Learning via Critical Parameter
Analysis [85.41873993551332]
Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server.
This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Analysis)
Our attack-tolerant aggregation method is based on the observation that benign local models have similar sets of top-k and bottom-k critical parameters, whereas poisoned local models do not.
arXiv Detail & Related papers (2023-08-18T05:37:55Z) - Membership Inference Attacks by Exploiting Loss Trajectory [19.900473800648243]
We propose a new attack method, called system, which can exploit the membership information from the whole training process of the target model.
Our attack achieves at least 6$times$ higher true-positive rate at a low false-positive rate of 0.1% than existing methods.
arXiv Detail & Related papers (2022-08-31T16:02:26Z) - Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems.
These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions.
In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks.
Privacy has become an important concern, restricting access to only trained models but not the training data.
We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z) - Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z) - Mitigating the Impact of Adversarial Attacks in Very Deep Networks [10.555822166916705]
Deep Neural Network (DNN) models have vulnerabilities related to security concerns.
Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models.
We propose an attack-agnostic-based defense method for mitigating their influence.
arXiv Detail & Related papers (2020-12-08T21:25:44Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Towards Class-Oriented Poisoning Attacks Against Neural Networks [1.14219428942199]
Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset.
We propose a class-oriented poisoning attack that is capable of forcing the corrupted model to predict in two specific ways.
To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework.
arXiv Detail & Related papers (2020-07-31T19:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.