Locally differentially private estimation of nonlinear functionals of
discrete distributions
- URL: http://arxiv.org/abs/2107.03940v2
- Date: Sat, 12 Aug 2023 09:50:20 GMT
- Title: Locally differentially private estimation of nonlinear functionals of
discrete distributions
- Authors: Cristina Butucea and Yann Issartel
- Abstract summary: We study the problem of estimating non-linear functionals of discrete distributions in the context of local differential privacy.
Only $alpha$-locally differentially private (LDP) samples are publicly available, where the term 'local' means that each $z_i$ is produced using one individual $x_i$.
We describe the behavior of the quadratic risk for estimating the power sum functional $F_gamma = sum_k=1K p_kgamma$, $gamma >0$ as a function of $K,, n
- Score: 9.028773906859541
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of estimating non-linear functionals of discrete
distributions in the context of local differential privacy. The initial data
$x_1,\ldots,x_n \in [K]$ are supposed i.i.d. and distributed according to an
unknown discrete distribution $p = (p_1,\ldots,p_K)$. Only $\alpha$-locally
differentially private (LDP) samples $z_1,...,z_n$ are publicly available,
where the term 'local' means that each $z_i$ is produced using one individual
attribute $x_i$. We exhibit privacy mechanisms (PM) that are interactive (i.e.
they are allowed to use already published confidential data) or
non-interactive. We describe the behavior of the quadratic risk for estimating
the power sum functional $F_{\gamma} = \sum_{k=1}^K p_k^{\gamma}$, $\gamma >0$
as a function of $K, \, n$ and $\alpha$. In the non-interactive case, we study
two plug-in type estimators of $F_{\gamma}$, for all $\gamma >0$, that are
similar to the MLE analyzed by Jiao et al. (2017) in the multinomial model.
However, due to the privacy constraint the rates we attain are slower and
similar to those obtained in the Gaussian model by Collier et al. (2020). In
the interactive case, we introduce for all $\gamma >1$ a two-step procedure
which attains the faster parametric rate $(n \alpha^2)^{-1/2}$ when $\gamma
\geq 2$. We give lower bounds results over all $\alpha$-LDP mechanisms and all
estimators using the private samples.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - On Computing Pairwise Statistics with Local Differential Privacy [55.81991984375959]
We study the problem of computing pairwise statistics, i.e., ones of the form $binomn2-1 sum_i ne j f(x_i, x_j)$, where $x_i$ denotes the input to the $i$th user, with differential privacy (DP) in the local model.
This formulation captures important metrics such as Kendall's $tau$ coefficient, Area Under Curve, Gini's mean difference, Gini's entropy, etc.
arXiv Detail & Related papers (2024-06-24T04:06:09Z) - Private Mean Estimation with Person-Level Differential Privacy [6.621676316292624]
We study person-level differentially private mean estimation in the case where each person holds multiple samples.
We give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds hold for the most permissive case of approximate DP.
arXiv Detail & Related papers (2024-05-30T18:20:35Z) - Debiasing and a local analysis for population clustering using
semidefinite programming [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions.
This work is motivated by the application of clustering individuals according to their population of origin.
arXiv Detail & Related papers (2024-01-16T03:14:24Z) - Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
We show that a dataset of size $widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability.
Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of $etapi$.
arXiv Detail & Related papers (2023-09-29T14:14:53Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - Private Convex Optimization via Exponential Mechanism [16.867534746193833]
We show that modifying the exponential mechanism by adding an $ellcave2$ regularizer to $F(x)$ recovers both the known optimal empirical risk and population loss under $(epsilon,delta)$-DP.
We also show how to implement this mechanism using $widetildeO(n min(d, n)) queries to $f_i(x) for the DP-SCO where $n$ is the number of samples/optimal and $d is the ambient dimension.
arXiv Detail & Related papers (2022-03-01T06:51:03Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.