Defending Against Neural Network Model Inversion Attacks via Data Poisoning
- URL: http://arxiv.org/abs/2412.07575v1
- Date: Tue, 10 Dec 2024 15:08:56 GMT
- Title: Defending Against Neural Network Model Inversion Attacks via Data Poisoning
- Authors: Shuai Zhou, Dayong Ye, Tianqing Zhu, Wanlei Zhou,
- Abstract summary: Model inversion attacks pose a significant privacy threat to machine learning models.
This paper introduces a novel defense mechanism to better balance privacy and utility.
We propose a strategy that leverages data poisoning to contaminate the training data of inversion models.
- Score: 15.099559883494475
- License:
- Abstract: Model inversion attacks pose a significant privacy threat to machine learning models by reconstructing sensitive data from their outputs. While various defenses have been proposed to counteract these attacks, they often come at the cost of the classifier's utility, thus creating a challenging trade-off between privacy protection and model utility. Moreover, most existing defenses require retraining the classifier for enhanced robustness, which is impractical for large-scale, well-established models. This paper introduces a novel defense mechanism to better balance privacy and utility, particularly against adversaries who employ a machine learning model (i.e., inversion model) to reconstruct private data. Drawing inspiration from data poisoning attacks, which can compromise the performance of machine learning models, we propose a strategy that leverages data poisoning to contaminate the training data of inversion models, thereby preventing model inversion attacks. Two defense methods are presented. The first, termed label-preserving poisoning attacks for all output vectors (LPA), involves subtle perturbations to all output vectors while preserving their labels. Our findings demonstrate that these minor perturbations, introduced through a data poisoning approach, significantly increase the difficulty of data reconstruction without compromising the utility of the classifier. Subsequently, we introduce a second method, label-flipping poisoning for partial output vectors (LFP), which selectively perturbs a small subset of output vectors and alters their labels during the process. Empirical results indicate that LPA is notably effective, outperforming the current state-of-the-art defenses. Our data poisoning-based defense provides a new retraining-free defense paradigm that preserves the victim classifier's utility.
Related papers
- FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems.
GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model.
Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Boosting Model Inversion Attacks with Adversarial Examples [26.904051413441316]
We propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting.
First, we regularize the training process of the attack model with an added semantic loss function.
Second, we inject adversarial examples into the training data to increase the diversity of the class-related parts.
arXiv Detail & Related papers (2023-06-24T13:40:58Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - DODEM: DOuble DEfense Mechanism Against Adversarial Attacks Towards
Secure Industrial Internet of Things Analytics [8.697883716452385]
We propose a double defense mechanism to detect and mitigate adversarial attacks in I-IoT environments.
We first detect if there is an adversarial attack on a given sample using novelty detection algorithms.
If there is an attack, adversarial retraining provides a more robust model, while we apply standard training for regular samples.
arXiv Detail & Related papers (2023-01-23T22:10:40Z) - Defending against the Label-flipping Attack in Federated Learning [5.769445676575767]
Federated learning (FL) provides autonomy and privacy by design to participating peers.
The label-flipping (LF) attack is a targeted poisoning attack where the attackers poison their training data by flipping the labels of some examples.
We propose a novel defense that first dynamically extracts those gradients from the peers' local updates.
arXiv Detail & Related papers (2022-07-05T12:02:54Z) - Gradient-based Data Subversion Attack Against Binary Classifiers [9.414651358362391]
In this work, we focus on label contamination attack in which an attacker poisons the labels of data to compromise the functionality of the system.
We exploit the gradients of a differentiable convex loss function with respect to the predicted label as a warm-start and formulate different strategies to find a set of data instances to contaminate.
Our experiments show that the proposed approach outperforms the baselines and is computationally efficient.
arXiv Detail & Related papers (2021-05-31T09:04:32Z) - Adversarial Poisoning Attacks and Defense for General Multi-Class Models
Based On Synthetic Reduced Nearest Neighbors [14.968442560499753]
State-of-the-art machine learning models are vulnerable to data poisoning attacks.
This paper proposes a novel model-free label-flipping attack based on the multi-modality of the data.
Second, a novel defense technique based on the Synthetic Reduced Nearest Neighbor (SRNN) model is proposed.
arXiv Detail & Related papers (2021-02-11T06:55:40Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.