Differentially Private (Gradient) Expectation Maximization Algorithm
with Statistical Guarantees
- URL: http://arxiv.org/abs/2010.13520v3
- Date: Sun, 16 Jan 2022 22:00:44 GMT
- Title: Differentially Private (Gradient) Expectation Maximization Algorithm
with Statistical Guarantees
- Authors: Di Wang and Jiahao Ding and Lijie Hu and Zejun Xie and Miao Pan and
Jinhui Xu
- Abstract summary: (Gradient) Expectation Maximization (EM) is a widely used algorithm for estimating the maximum likelihood of mixture models or incomplete data problems.
Previous research on this problem has already lead to the discovery of some Differentially Private (DP) algorithms for (Gradient) EM.
We propose in this paper the first DP version of (Gradient) EM algorithm with statistical guarantees.
- Score: 25.996994681543903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: (Gradient) Expectation Maximization (EM) is a widely used algorithm for
estimating the maximum likelihood of mixture models or incomplete data
problems. A major challenge facing this popular technique is how to effectively
preserve the privacy of sensitive data. Previous research on this problem has
already lead to the discovery of some Differentially Private (DP) algorithms
for (Gradient) EM. However, unlike in the non-private case, existing techniques
are not yet able to provide finite sample statistical guarantees. To address
this issue, we propose in this paper the first DP version of (Gradient) EM
algorithm with statistical guarantees. Moreover, we apply our general framework
to three canonical models: Gaussian Mixture Model (GMM), Mixture of Regressions
Model (MRM) and Linear Regression with Missing Covariates (RMC). Specifically,
for GMM in the DP model, our estimation error is near optimal in some cases.
For the other two models, we provide the first finite sample statistical
guarantees. Our theory is supported by thorough numerical experiments.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Generating Private Synthetic Data with Genetic Algorithms [29.756119782419955]
We study the problem of efficiently generating differentially private synthetic data that approximates the statistical properties of an underlying sensitive dataset.
We propose Private-GSD, a private genetic algorithm based on zeroth-order optimizations that do not require modifying the original objective.
We show that Private-GSD outperforms the state-of-the-art methods on non-differential queries while matching accuracy in approximating differentiable ones.
arXiv Detail & Related papers (2023-06-05T21:19:37Z) - An Application of a Multivariate Estimation of Distribution Algorithm to
Cancer Chemotherapy [59.40521061783166]
Chemotherapy treatment for cancer is a complex optimisation problem with a large number of interacting variables and constraints.
We show that the more sophisticated algorithm would yield better performance on a complex problem like this.
We hypothesise that this is caused by the more sophisticated algorithm being impeded by the large number of interactions in the problem.
arXiv Detail & Related papers (2022-05-17T15:28:46Z) - DPER: Efficient Parameter Estimation for Randomly Missing Data [0.24466725954625884]
We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set.
Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
arXiv Detail & Related papers (2021-06-06T16:37:48Z) - High-Dimensional Differentially-Private EM Algorithm: Methods and
Near-Optimal Statistical Guarantees [8.089708900273804]
We develop a general framework to design differentially private expectation-maximization (EM) algorithms in high-dimensional latent variable models.
In each model, we establish the near-optimal rate of convergence with differential privacy constraints.
We propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting.
arXiv Detail & Related papers (2021-04-01T04:08:34Z) - Community Detection in the Stochastic Block Model by Mixed Integer
Programming [3.8073142980733]
Degree-Corrected Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence.
Standard approach of community detection based on the DCSBM is to search for the model parameters that are the most likely to have produced the observed network data through maximum likelihood estimation (MLE)
We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph.
arXiv Detail & Related papers (2021-01-26T22:04:40Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Robust Compressed Sensing using Generative Models [98.64228459705859]
In this paper we propose an algorithm inspired by the Median-of-Means (MOM)
Our algorithm guarantees recovery for heavy-tailed data, even in the presence of outliers.
arXiv Detail & Related papers (2020-06-16T19:07:41Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.