Minimax optimal goodness-of-fit testing for densities and multinomials
under a local differential privacy constraint
- URL: http://arxiv.org/abs/2002.04254v3
- Date: Thu, 15 Apr 2021 07:12:21 GMT
- Title: Minimax optimal goodness-of-fit testing for densities and multinomials
under a local differential privacy constraint
- Authors: Joseph Lam-Weil, B\'eatrice Laurent, Jean-Michel Loubes
- Abstract summary: We consider the consequences of local differential privacy constraints on goodness-of-fit testing.
We present a test that is adaptive to the smoothness parameter of the unknown density and remains minimax optimal up to a logarithmic factor.
- Score: 3.265773263570237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding anonymization mechanisms to protect personal data is at the heart of
recent machine learning research. Here, we consider the consequences of local
differential privacy constraints on goodness-of-fit testing, i.e. the
statistical problem assessing whether sample points are generated from a fixed
density $f_0$, or not. The observations are kept hidden and replaced by a
stochastic transformation satisfying the local differential privacy constraint.
In this setting, we propose a testing procedure which is based on an estimation
of the quadratic distance between the density $f$ of the unobserved samples and
$f_0$. We establish an upper bound on the separation distance associated with
this test, and a matching lower bound on the minimax separation rates of
testing under non-interactive privacy in the case that $f_0$ is uniform, in
discrete and continuous settings. To the best of our knowledge, we provide the
first minimax optimal test and associated private transformation under a local
differential privacy constraint over Besov balls in the continuous setting,
quantifying the price to pay for data privacy. We also present a test that is
adaptive to the smoothness parameter of the unknown density and remains minimax
optimal up to a logarithmic factor. Finally, we note that our results can be
translated to the discrete case, where the treatment of probability vectors is
shown to be equivalent to that of piecewise constant densities in our setting.
That is why we work with a unified setting for both the continuous and the
discrete cases.
Related papers
- Minimax And Adaptive Transfer Learning for Nonparametric Classification under Distributed Differential Privacy Constraints [6.042269506496206]
We first establish the minimax misclassification rate, precisely characterizing the effects of privacy constraints, source samples, and target samples on classification accuracy.
The results reveal interesting phase transition phenomena and highlight the intricate trade-offs between preserving privacy and achieving classification accuracy.
arXiv Detail & Related papers (2024-06-28T17:55:41Z) - Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints [5.3595271893779906]
We study federated learning for nonparametric regression in the context of distributed samples across different servers.
Findings shed light on the tradeoff between statistical accuracy and privacy preservation.
arXiv Detail & Related papers (2024-06-10T19:34:07Z) - Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests [5.3595271893779906]
Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations.
We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
arXiv Detail & Related papers (2024-06-10T19:25:19Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion.
We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high.
Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z) - Optimal Algorithms for Mean Estimation under Local Differential Privacy [55.32262879188817]
We show that PrivUnit achieves the optimal variance among a large family of locally private randomizers.
We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees.
arXiv Detail & Related papers (2022-05-05T06:43:46Z) - Uniformity Testing in the Shuffle Model: Simpler, Better, Faster [0.0]
Uniformity testing, or testing whether independent observations are uniformly distributed, is the question in distribution testing.
In this work, we considerably simplify the analysis of the known uniformity testing algorithm in the shuffle model.
arXiv Detail & Related papers (2021-08-20T03:43:12Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - Privacy Preserving Recalibration under Domain Shift [119.21243107946555]
We introduce a framework that abstracts out the properties of recalibration problems under differential privacy constraints.
We also design a novel recalibration algorithm, accuracy temperature scaling, that outperforms prior work on private datasets.
arXiv Detail & Related papers (2020-08-21T18:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.