Polynomial Time and Private Learning of Unbounded Gaussian Mixture
Models
- URL: http://arxiv.org/abs/2303.04288v2
- Date: Wed, 7 Jun 2023 23:35:37 GMT
- Title: Polynomial Time and Private Learning of Unbounded Gaussian Mixture
Models
- Authors: Jamil Arbas, Hassan Ashtiani and Christopher Liaw
- Abstract summary: We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components.
We develop a technique to reduce the problem to its non-private counterpart.
We develop an $(varepsilon, delta)$-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra Valiant and [MV10] as a blackbox.
- Score: 9.679150363410471
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We study the problem of privately estimating the parameters of
$d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this,
we develop a technique to reduce the problem to its non-private counterpart.
This allows us to privatize existing non-private algorithms in a blackbox
manner, while incurring only a small overhead in the sample complexity and
running time. As the main application of our framework, we develop an
$(\varepsilon, \delta)$-differentially private algorithm to learn GMMs using
the non-private algorithm of Moitra and Valiant [MV10] as a blackbox.
Consequently, this gives the first sample complexity upper bound and first
polynomial time algorithm for privately learning GMMs without any boundedness
assumptions on the parameters. As part of our analysis, we prove a tight (up to
a constant factor) lower bound on the total variation distance of
high-dimensional Gaussians which can be of independent interest.
Related papers
- Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters [5.429282997550318]
We train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.
Our proposed algorithm significantly reduces runtime complexity per iteration from $mathcalO(NCD2)$ to a complexity scaling linearly with $D$ and remaining constant w.r.t.
As a proof of concept, we train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.
arXiv Detail & Related papers (2025-01-21T17:11:25Z) - Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm.
DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms.
We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z) - Fast Optimal Locally Private Mean Estimation via Random Projections [58.603579803010796]
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball.
We propose a new algorithmic framework, ProjUnit, for private mean estimation.
Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.
arXiv Detail & Related papers (2023-06-07T14:07:35Z) - Learning Hidden Markov Models Using Conditional Samples [72.20944611510198]
This paper is concerned with the computational complexity of learning the Hidden Markov Model (HMM)
In this paper, we consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.
Specifically, we obtain efficient algorithms for learning HMMs in settings where we have query access to the exact conditional probabilities.
arXiv Detail & Related papers (2023-02-28T16:53:41Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Efficient Mean Estimation with Pure Differential Privacy via a
Sum-of-Squares Exponential Mechanism [16.996435043565594]
We give the first-time algorithm to estimate the mean of a $d$-positive probability distribution with covariance from $tildeO(d)$ independent samples subject to pure differential privacy.
Our main technique is a new approach to use the powerful Sum of Squares method (SoS) to design differentially private algorithms.
arXiv Detail & Related papers (2021-11-25T09:31:15Z) - Robust Model Selection and Nearly-Proper Learning for GMMs [26.388358539260473]
In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance?
We are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor.
arXiv Detail & Related papers (2021-06-05T01:58:40Z) - Differentially Private ADMM Algorithms for Machine Learning [38.648113004535155]
We study efficient differentially private alternating direction methods of multipliers (ADMM) via gradient perturbation.
We propose the first differentially private ADMM (DP-ADMM) algorithm with performance guarantee of $(epsilon,delta)$-differential privacy.
arXiv Detail & Related papers (2020-10-31T01:37:24Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.