Private Estimation with Public Data
- URL: http://arxiv.org/abs/2208.07984v2
- Date: Thu, 6 Apr 2023 00:36:23 GMT
- Title: Private Estimation with Public Data
- Authors: Alex Bie, Gautam Kamath, Vikrant Singhal
- Abstract summary: We study differentially private (DP) estimation with access to a small amount of public data.
We show that under the constraints of pure or concentrated DP, d+1 public data samples are sufficient to remove any dependence on the range parameters of the private data distribution.
- Score: 10.176795938619417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate the study of differentially private (DP) estimation with access
to a small amount of public data. For private estimation of d-dimensional
Gaussians, we assume that the public data comes from a Gaussian that may have
vanishing similarity in total variation distance with the underlying Gaussian
of the private data. We show that under the constraints of pure or concentrated
DP, d+1 public data samples are sufficient to remove any dependence on the
range parameters of the private data distribution from the private sample
complexity, which is known to be otherwise necessary without public data. For
separated Gaussian mixtures, we assume that the underlying public and private
distributions are the same, and we consider two settings: (1) when given a
dimension-independent amount of public data, the private sample complexity can
be improved polynomially in terms of the number of mixture components, and any
dependence on the range parameters of the distribution can be removed in the
approximate DP case; (2) when given an amount of public data linear in the
dimension, the private sample complexity can be made independent of range
parameters even under concentrated DP, and additional improvements can be made
to the overall sample complexity.
Related papers
- Statistical Inference for Privatized Data with Unknown Sample Size [7.933465724913661]
We develop both theory and algorithms to analyze privatized data in the unbounded differential privacy(DP)
We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity.
arXiv Detail & Related papers (2024-06-10T13:03:20Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Differentially Private Sampling from Distributions [1.452875650827562]
In some regimes, private sampling requires fewer observations than learning a description of $P$ nonprivately.
For some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
arXiv Detail & Related papers (2022-11-15T14:56:42Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Nonparametric extensions of randomized response for private confidence sets [51.75485869914048]
This work derives methods for performing nonparametric, nonasymptotic statistical inference for population means under the constraint of local differential privacy (LDP)
We present confidence intervals (CI) and time-uniform confidence sequences (CS) for $mustar$ when only given access to the privatized data.
arXiv Detail & Related papers (2022-02-17T16:04:49Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - Compressive Privatization: Sparse Distribution Estimation under Locally
Differentially Privacy [18.43218511751587]
We show that as long as the target distribution is sparse or approximately sparse, the number of samples needed could be significantly reduced.
Our mechanism does privatization and dimensionality reduction simultaneously, and the sample complexity will only depend on the reduced dimensionality.
arXiv Detail & Related papers (2020-12-03T17:14:23Z) - Bounding, Concentrating, and Truncating: Unifying Privacy Loss
Composition for Data Analytics [2.614355818010333]
We provide strong privacy loss bounds when an analyst may select pure DP, bounded range (e.g. exponential mechanisms) or concentrated DP mechanisms in any order.
We also provide optimal privacy loss bounds that apply when an analyst can select pure DP and bounded range mechanisms in a batch.
arXiv Detail & Related papers (2020-04-15T17:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.