Black Box Differential Privacy Auditing Using Total Variation Distance
- URL: http://arxiv.org/abs/2406.04827v2
- Date: Fri, 5 Jul 2024 21:38:38 GMT
- Title: Black Box Differential Privacy Auditing Using Total Variation Distance
- Authors: Antti Koskela, Jafar Mohammadi,
- Abstract summary: We present a practical method to audit the differential privacy guarantees of a machine learning model using a small hold-out dataset.
Our method estimates the total variation (TV) distance between scores obtained with a subset of the training data and the hold-out dataset.
- Score: 3.830092569453011
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a practical method to audit the differential privacy (DP) guarantees of a machine learning model using a small hold-out dataset that is not exposed to the model during the training. Having a score function such as the loss function employed during the training, our method estimates the total variation (TV) distance between scores obtained with a subset of the training data and the hold-out dataset. With some meta information about the underlying DP training algorithm, these TV distance values can be converted to $(\varepsilon,\delta)$-guarantees for any $\delta$. We show that these score distributions asymptotically give lower bounds for the DP guarantees of the underlying training algorithm, however, we perform a one-shot estimation for practicality reasons. We specify conditions that lead to lower bounds for the DP guarantees with high probability. To estimate the TV distance between the score distributions, we use a simple density estimation method based on histograms. We show that the TV distance gives a very close to optimally robust estimator and has an error rate $\mathcal{O}(k^{-1/3})$, where $k$ is the total number of samples. Numerical experiments on benchmark datasets illustrate the effectiveness of our approach and show improvements over baseline methods for black-box auditing.
Related papers
- Resampling methods for private statistical inference [1.8110941972682346]
We consider the task of constructing confidence intervals with differential privacy.
We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data.
For a fixed differential privacy parameter $epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$.
arXiv Detail & Related papers (2024-02-11T08:59:02Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Connect the Dots: Tighter Discrete Approximations of Privacy Loss
Distributions [49.726408540784334]
Key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.
We show that our pessimistic estimate is the best possible among all pessimistic estimates.
arXiv Detail & Related papers (2022-07-10T04:25:02Z) - Bayesian Estimation of Differential Privacy [0.0]
Differentially Private SGD enable training machine learning models with formal privacy guarantees.
There is a discrepancy between the protection that such algorithms guarantee in theory and the protection they afford in practice.
This paper empirically estimates the protection afforded by differentially private training as a confidence interval for the privacy budget.
arXiv Detail & Related papers (2022-06-10T15:57:18Z) - Differentially Private Federated Learning via Inexact ADMM with Multiple
Local Updates [0.0]
We develop a DP inexact alternating direction method of multipliers algorithm with multiple local updates for federated learning.
We show that our algorithm provides $barepsilon$-DP for every iteration, where $barepsilon$ is a privacy budget controlled by the user.
We demonstrate that our algorithm reduces the testing error by at most $31%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2022-02-18T19:58:47Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Differentially Private Federated Learning via Inexact ADMM [0.0]
Differential privacy (DP) techniques can be applied to the federated learning model to protect data privacy against inference attacks.
We develop a DP inexact alternating direction method of multipliers algorithm that solves a sequence of trust-region subproblems.
Our algorithm reduces the testing error by at most $22%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2021-06-11T02:28:07Z) - Achieving differential privacy for $k$-nearest neighbors based outlier
detection by data partitioning [0.3437656066916039]
We develop a differentially private ($epsilon$-DP) approach for $k$-NN based outlier detection.
We propose a method for $k$-NN based outlier detection by separating the procedure into a fitting step on reference inlier data and then apply the outlier classifier to new data.
Our approach yields nearly optimal performance on real-world data with varying dimensions when compared to the non-private versions of $k$-NN.
arXiv Detail & Related papers (2021-04-16T07:35:26Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.