Data and Model Dependencies of Membership Inference Attack
- URL: http://arxiv.org/abs/2002.06856v5
- Date: Sat, 25 Jul 2020 06:25:58 GMT
- Title: Data and Model Dependencies of Membership Inference Attack
- Authors: Shakila Mahjabin Tonni, Dinusha Vatsalan, Farhad Farokhi, Dali Kaafar,
Zhigang Lu and Gioacchino Tangari
- Abstract summary: We provide an empirical analysis of the impact of both the data and ML model properties on the vulnerability of ML techniques to MIA.
Our results reveal the relationship between MIA accuracy and properties of the dataset and training model in use.
We propose using those data and model properties as regularizers to protect ML models against MIA.
- Score: 13.951470844348899
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) models have been shown to be vulnerable to Membership
Inference Attacks (MIA), which infer the membership of a given data point in
the target dataset by observing the prediction output of the ML model. While
the key factors for the success of MIA have not yet been fully understood,
existing defense mechanisms such as using L2 regularization
\cite{10shokri2017membership} and dropout layers \cite{salem2018ml} take only
the model's overfitting property into consideration. In this paper, we provide
an empirical analysis of the impact of both the data and ML model properties on
the vulnerability of ML techniques to MIA. Our results reveal the relationship
between MIA accuracy and properties of the dataset and training model in use.
In particular, we show that the size of shadow dataset, the class and feature
balance and the entropy of the target dataset, the configurations and fairness
of the training model are the most influential factors. Based on those
experimental findings, we conclude that along with model overfitting, multiple
properties jointly contribute to MIA success instead of any single property.
Building on our experimental findings, we propose using those data and model
properties as regularizers to protect ML models against MIA. Our results show
that the proposed defense mechanisms can reduce the MIA accuracy by up to 25\%
without sacrificing the ML model prediction utility.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models.
Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens.
The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z) - Active Fourier Auditor for Estimating Distributional Properties of ML Models [10.581140430698103]
We focus on three properties: robustness, individual fairness, and group fairness.
We develop a new framework that quantifies different properties in terms of the Fourier coefficients of the ML model under audit.
We derive high probability error bounds on AFA's estimates, along with the worst-case lower bounds on the sample complexity to audit them.
arXiv Detail & Related papers (2024-10-10T16:57:01Z) - Decomposing and Editing Predictions by Modeling Model Computation [75.37535202884463]
We introduce a task called component modeling.
The goal of component modeling is to decompose an ML model's prediction in terms of its components.
We present COAR, a scalable algorithm for estimating component attributions.
arXiv Detail & Related papers (2024-04-17T16:28:08Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Oversampling Higher-Performing Minorities During Machine Learning Model
Training Reduces Adverse Impact Slightly but Also Reduces Model Accuracy [18.849426971487077]
We systematically under- and oversampled minority (Black and Hispanic) applicants to manipulate adverse impact ratios in training data.
We found that training data adverse impact related linearly to ML model adverse impact.
We observed consistent effects across self-reports and interview transcripts, whether oversampling real or synthetic observations.
arXiv Detail & Related papers (2023-04-27T02:53:29Z) - An Investigation of Smart Contract for Collaborative Machine Learning
Model Training [3.5679973993372642]
Collaborative machine learning (CML) has penetrated various fields in the era of big data.
As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy.
Based on blockchain, smart contracts enable automatic execution of data preserving and validation.
arXiv Detail & Related papers (2022-09-12T04:25:01Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.