LDL: A Defense for Label-Based Membership Inference Attacks
- URL: http://arxiv.org/abs/2212.01688v1
- Date: Sat, 3 Dec 2022 20:55:10 GMT
- Title: LDL: A Defense for Label-Based Membership Inference Attacks
- Authors: Arezoo Rajabi, Dinuka Sahabandu, Luyao Niu, Bhaskar Ramasubramanian,
Radha Poovendran
- Abstract summary: Overfitted deep neural network (DNN) models are susceptible to query-based attacks.
New class of label based MIAs (LAB MIAs) were proposed, where an adversary was only required to have knowledge of predicted labels of samples.
We present LDL, a light weight defense against LAB MIAs.
- Score: 5.542528986254584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The data used to train deep neural network (DNN) models in applications such
as healthcare and finance typically contain sensitive information. A DNN model
may suffer from overfitting. Overfitted models have been shown to be
susceptible to query-based attacks such as membership inference attacks (MIAs).
MIAs aim to determine whether a sample belongs to the dataset used to train a
classifier (members) or not (nonmembers). Recently, a new class of label based
MIAs (LAB MIAs) was proposed, where an adversary was only required to have
knowledge of predicted labels of samples. Developing a defense against an
adversary carrying out a LAB MIA on DNN models that cannot be retrained remains
an open problem.
We present LDL, a light weight defense against LAB MIAs. LDL works by
constructing a high-dimensional sphere around queried samples such that the
model decision is unchanged for (noisy) variants of the sample within the
sphere. This sphere of label-invariance creates ambiguity and prevents a
querying adversary from correctly determining whether a sample is a member or a
nonmember. We analytically characterize the success rate of an adversary
carrying out a LAB MIA when LDL is deployed, and show that the formulation is
consistent with experimental observations. We evaluate LDL on seven datasets --
CIFAR-10, CIFAR-100, GTSRB, Face, Purchase, Location, and Texas -- with varying
sizes of training data. All of these datasets have been used by SOTA LAB MIAs.
Our experiments demonstrate that LDL reduces the success rate of an adversary
carrying out a LAB MIA in each case. We empirically compare LDL with defenses
against LAB MIAs that require retraining of DNN models, and show that LDL
performs favorably despite not needing to retrain the DNNs.
Related papers
- LLM Robustness Against Misinformation in Biomedical Question Answering [50.98256373698759]
The retrieval-augmented generation (RAG) approach is used to reduce the confabulation of large language models (LLMs) for question answering.
We evaluate the effectiveness and robustness of four LLMs against misinformation in answering biomedical questions.
arXiv Detail & Related papers (2024-10-27T16:23:26Z) - Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data.
Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership.
We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models [4.081098869497239]
We develop state-of-the-art privacy attacks against Large Language Models (LLMs)
New membership inference attacks (MIAs) against pretrained LLMs perform hundreds of times better than baseline attacks.
In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance.
arXiv Detail & Related papers (2024-02-26T20:41:50Z) - Double-Dip: Thwarting Label-Only Membership Inference Attacks with
Transfer Learning and Randomization [2.6121142662033923]
A class of privacy attacks called membership inference attacks (MIAs) aim to determine whether a given sample belongs to the training dataset (member) or not (nonmember)
arXiv Detail & Related papers (2024-02-02T03:14:37Z) - MIA-BAD: An Approach for Enhancing Membership Inference Attack and its
Mitigation with Federated Learning [6.510488168434277]
The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model.
We propose an enhanced Membership Inference Attack with the Batch-wise generated Attack dataset (MIA-BAD)
We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches.
arXiv Detail & Related papers (2023-11-28T06:51:26Z) - Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration [32.15773300068426]
Membership Inference Attacks (MIAs) aim to infer whether a target data record has been utilized for model training or not.
We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA)
Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal.
arXiv Detail & Related papers (2023-11-10T13:55:05Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Inaccurate Label Distribution Learning [56.89970970094207]
Label distribution learning (LDL) trains a model to predict the relevance of a set of labels (called label distribution (LD)) to an instance.
This paper investigates the problem of inaccurate LDL, i.e., developing an LDL model with noisy LDs.
arXiv Detail & Related papers (2023-02-25T06:23:45Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.