DPpack: An R Package for Differentially Private Statistical Analysis and
Machine Learning
- URL: http://arxiv.org/abs/2309.10965v1
- Date: Tue, 19 Sep 2023 23:36:11 GMT
- Title: DPpack: An R Package for Differentially Private Statistical Analysis and
Machine Learning
- Authors: Spencer Giddens and Fang Liu
- Abstract summary: Differential privacy (DP) is the state-of-the-art framework for guaranteeing privacy for individuals when releasing aggregated statistics or building statistical/machine learning models from data.
We develop the open-source R package DPpack that provides a large toolkit of differentially private analysis.
- Score: 3.5966786737142304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential privacy (DP) is the state-of-the-art framework for guaranteeing
privacy for individuals when releasing aggregated statistics or building
statistical/machine learning models from data. We develop the open-source R
package DPpack that provides a large toolkit of differentially private
analysis. The current version of DPpack implements three popular mechanisms for
ensuring DP: Laplace, Gaussian, and exponential. Beyond that, DPpack provides a
large toolkit of easily accessible privacy-preserving descriptive statistics
functions. These include mean, variance, covariance, and quantiles, as well as
histograms and contingency tables. Finally, DPpack provides user-friendly
implementation of privacy-preserving versions of logistic regression, SVM, and
linear regression, as well as differentially private hyperparameter tuning for
each of these models. This extensive collection of implemented differentially
private statistics and models permits hassle-free utilization of differential
privacy principles in commonly performed statistical analysis. We plan to
continue developing DPpack and make it more comprehensive by including more
differentially private machine learning techniques, statistical modeling and
inference in the future.
Related papers
- Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Large Scale Transfer Learning for Differentially Private Image
Classification [51.10365553035979]
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy.
Private training using DP-SGD protects against leakage by injecting noise into individual example gradients.
While this result is quite appealing, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training.
arXiv Detail & Related papers (2022-05-06T01:22:20Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Differentially Private (Gradient) Expectation Maximization Algorithm
with Statistical Guarantees [25.996994681543903]
(Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems.
Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM.
We propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees.
arXiv Detail & Related papers (2020-10-22T03:41:19Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.