Strongly universally consistent nonparametric regression and
classification with privatised data
- URL: http://arxiv.org/abs/2011.00216v1
- Date: Sat, 31 Oct 2020 09:00:43 GMT
- Title: Strongly universally consistent nonparametric regression and
classification with privatised data
- Authors: Thomas Berrett, L\'aszl\'o Gy\"orfi, Harro Walk
- Abstract summary: We revisit the classical problem of nonparametric regression, but impose local differential privacy constraints.
We design a novel estimator of the regression function, which can be viewed as a privatised version of the well-studied partitioning regression estimator.
- Score: 2.879036956042183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we revisit the classical problem of nonparametric regression,
but impose local differential privacy constraints. Under such constraints, the
raw data $(X_1,Y_1),\ldots,(X_n,Y_n)$, taking values in $\mathbb{R}^d \times
\mathbb{R}$, cannot be directly observed, and all estimators are functions of
the randomised output from a suitable privacy mechanism. The statistician is
free to choose the form of the privacy mechanism, and here we add Laplace
distributed noise to a discretisation of the location of a feature vector $X_i$
and to the value of its response variable $Y_i$. Based on this randomised data,
we design a novel estimator of the regression function, which can be viewed as
a privatised version of the well-studied partitioning regression estimator. The
main result is that the estimator is strongly universally consistent. Our
methods and analysis also give rise to a strongly universally consistent binary
classification rule for locally differentially private data.
Related papers
- On Differentially Private U Statistics [25.683071759227293]
We propose a new thresholding-based approach using emphlocal H'ajek projections to reweight different subsets of the data.
This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
arXiv Detail & Related papers (2024-07-06T03:27:14Z) - Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares [38.478776450327125]
We present a sample- and time-efficient differentially private algorithm for ordinary least squares.
Our near-optimal accuracy holds for any dataset with the condition number, or exponential time.
arXiv Detail & Related papers (2024-04-23T18:00:38Z) - On Rate-Optimal Partitioning Classification from Observable and from
Privatised Data [0.0]
We revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions.
The privacy constraints mean that the data $(X_i$), dots,(X_n,Y_n)$ cannot be directly observed.
We add Laplace distributed noises to the discontinuations of all possible locations of the feature vector $X_i$ and to its label $Y_i$.
arXiv Detail & Related papers (2023-12-22T18:07:18Z) - Differentially Private Statistical Inference through $\beta$-Divergence
One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process.
This provides private estimation that is generally applicable without requiring changes to the underlying model.
We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Regression with Label Differential Privacy [64.21020761920322]
We derive a label DP randomization mechanism that is optimal under a given regression loss function.
We prove that the optimal mechanism takes the form of a "randomized response on bins"
arXiv Detail & Related papers (2022-12-12T17:41:32Z) - $p$-Generalized Probit Regression and Scalable Maximum Likelihood
Estimation via Sketching and Coresets [74.37849422071206]
We study the $p$-generalized probit regression model, which is a generalized linear model for binary responses.
We show how the maximum likelihood estimator for $p$-generalized probit regression can be approximated efficiently up to a factor of $(1+varepsilon)$ on large data.
arXiv Detail & Related papers (2022-03-25T10:54:41Z) - Nonparametric extensions of randomized response for private confidence sets [51.75485869914048]
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP)
We present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mustar$ when only given access to the privatized data.
arXiv Detail & Related papers (2022-02-17T16:04:49Z) - Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware
Regression [91.3373131262391]
Uncertainty is the only certainty there is.
Traditionally, the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions.
How to model the uncertainty within the present-day technologies for regression remains an open issue.
arXiv Detail & Related papers (2021-03-25T06:56:09Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Distributionally-Robust Machine Learning Using Locally
Differentially-Private Data [14.095523601311374]
We consider machine learning, particularly regression, using locally-differentially private datasets.
We show that machine learning with locally-differentially private datasets can be rewritten as a distributionally-robust optimization.
arXiv Detail & Related papers (2020-06-24T05:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.