Private Distribution Learning with Public Data: The View from Sample
Compression
- URL: http://arxiv.org/abs/2308.06239v2
- Date: Mon, 14 Aug 2023 19:26:43 GMT
- Title: Private Distribution Learning with Public Data: The View from Sample
Compression
- Authors: Shai Ben-David, Alex Bie, Cl\'ement L. Canonne, Gautam Kamath, Vikrant
Singhal
- Abstract summary: We study the problem of private distribution learning with access to public data.
We show that the public-private learnability of a class $mathcal Q$ is connected to the existence of a sample compression scheme.
- Score: 15.626115475759713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of private distribution learning with access to public
data. In this setup, which we refer to as public-private learning, the learner
is given public and private samples drawn from an unknown distribution $p$
belonging to a class $\mathcal Q$, with the goal of outputting an estimate of
$p$ while adhering to privacy constraints (here, pure differential privacy)
only with respect to the private samples.
We show that the public-private learnability of a class $\mathcal Q$ is
connected to the existence of a sample compression scheme for $\mathcal Q$, as
well as to an intermediate notion we refer to as list learning. Leveraging this
connection: (1) approximately recovers previous results on Gaussians over
$\mathbb R^d$; and (2) leads to new ones, including sample complexity upper
bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for
agnostic and distribution-shift resistant learners, as well as closure
properties for public-private learnability under taking mixtures and products
of distributions. Finally, via the connection to list learning, we show that
for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for
private learnability, which is close to the known upper bound of $d+1$ public
samples.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples [9.649879910148854]
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP)
Our main result is that $textpoly(k,d,1/alpha,1/varepsilon,log (1/delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $mathbbRd$ up to total variation distance $alpha$.
This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs.
arXiv Detail & Related papers (2023-09-07T17:02:32Z) - Differentially-Private Bayes Consistency [70.92545332158217]
We construct a Bayes consistent learning rule that satisfies differential privacy (DP)
We prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal sample complexity.
arXiv Detail & Related papers (2022-12-08T11:57:30Z) - Differentially Private Sampling from Distributions [1.452875650827562]
In some regimes, private sampling requires fewer observations than learning a description of $P$ nonprivately.
For some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
arXiv Detail & Related papers (2022-11-15T14:56:42Z) - New Lower Bounds for Private Estimation and a Generalized Fingerprinting
Lemma [10.176795938619417]
We prove new lower bounds for statistical estimation tasks under the constraint of $(varepsilon, delta)$-differential privacy.
We show that estimating the Frobenius norm requires $Omega(d2)$ samples, and in spectral norm requires $Omega(d3/2)$ samples, both matching upper bounds up to logarithmic factors.
arXiv Detail & Related papers (2022-05-17T17:55:10Z) - Nonparametric extensions of randomized response for private confidence sets [51.75485869914048]
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP)
We present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mustar$ when only given access to the privatized data.
arXiv Detail & Related papers (2022-02-17T16:04:49Z) - Infinitely Divisible Noise in the Low Privacy Regime [9.39772079241093]
Federated learning, in which training data is distributed among users and never shared, has emerged as a popular approach to privacy-preserving machine learning.
We present the first divisible infinitely noise distribution for real-valued data that achieves $varepsilon$-differential privacy.
arXiv Detail & Related papers (2021-10-13T08:16:43Z) - Privately Learning Mixtures of Axis-Aligned Gaussians [12.77304082363491]
We show that $widetildeO(k2 d log3/2 (1/delta) / alpha2 varepsilon)$ samples are sufficient to learn a mixture of $k$ axis-aligned Gaussians.
We design a new technique for privately learning mixture distributions.
arXiv Detail & Related papers (2021-06-03T22:34:23Z) - On the Intrinsic Differential Privacy of Bagging [69.70602220716718]
We show that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
Our experimental results demonstrate that Bagging achieves significantly higher accuracies than state-of-the-art differentially private machine learning methods with the same privacy budgets.
arXiv Detail & Related papers (2020-08-22T14:17:55Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Private Query Release Assisted by Public Data [96.6174729958211]
We study the problem of differentially private query release assisted by access to public data.
We show that we can solve the problem for any query class $mathcalH$ of finite VC-dimension using only $d/alpha$ public samples and $sqrtpd3/2/alpha2$ private samples.
arXiv Detail & Related papers (2020-04-23T02:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.