Compressive Privatization: Sparse Distribution Estimation under Locally
Differentially Privacy
- URL: http://arxiv.org/abs/2012.02081v1
- Date: Thu, 3 Dec 2020 17:14:23 GMT
- Title: Compressive Privatization: Sparse Distribution Estimation under Locally
Differentially Privacy
- Authors: Zhongzheng Xiong, Zengfeng Huang, Xiaojun Mao, Jian Wang, Shan Ying
- Abstract summary: We show that as long as the target distribution is sparse or approximately sparse, the number of samples needed could be significantly reduced.
Our mechanism does privatization and dimensionality reduction simultaneously, and the sample complexity will only depend on the reduced dimensionality.
- Score: 18.43218511751587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of discrete distribution estimation under locally
differential privacy. Distribution estimation is one of the most fundamental
estimation problems, which is widely studied in both non-private and private
settings. In the local model, private mechanisms with provably optimal sample
complexity are known. However, they are optimal only in the worst-case sense;
their sample complexity is proportional to the size of the entire universe,
which could be huge in practice (e.g., all IP addresses). We show that as long
as the target distribution is sparse or approximately sparse (e.g., highly
skewed), the number of samples needed could be significantly reduced. The
sample complexity of our new mechanism is characterized by the sparsity of the
target distribution and only weakly depends on the size the universe. Our
mechanism does privatization and dimensionality reduction simultaneously, and
the sample complexity will only depend on the reduced dimensionality. The
original distribution is then recovered using tools from compressive sensing.
To complement our theoretical results, we conduct experimental studies, the
results of which clearly demonstrate the advantages of our method and confirm
our theoretical findings.
Related papers
- Exactly Minimax-Optimal Locally Differentially Private Sampling [12.587817635325266]
We define the fundamental PUT of private sampling in the minimax sense, using the f-divergence between original and sampling distributions as the utility measure.
We characterize the exact PUT for both finite and continuous data spaces under mild conditions on the data distributions, and propose sampling mechanisms that are universally optimal for all f-divergences.
arXiv Detail & Related papers (2024-10-30T05:13:18Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Quantizing Heavy-tailed Data in Statistical Estimation: (Near) Minimax
Rates, Covariate Quantization, and Uniform Recovery [22.267482745621997]
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems.
We propose to truncate and properly dither the data prior to a uniform quantization.
arXiv Detail & Related papers (2022-12-30T06:28:30Z) - Differentially Private Sampling from Distributions [1.452875650827562]
In some regimes, private sampling requires fewer observations than learning a description of $P$ nonprivately.
For some classes of distributions, the overhead in the number of observations needed for private learning compared to non-private learning is completely captured by the number of observations needed for private sampling.
arXiv Detail & Related papers (2022-11-15T14:56:42Z) - Differentially private multivariate medians [4.588028371034407]
We develop novel finite-sample performance guarantees for differentially private depth-based medians.
We show that under Cauchy marginals, the cost of heavy-tailed location estimation outweighs the cost of privacy.
arXiv Detail & Related papers (2022-10-12T17:56:04Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - BR-SNIS: Bias Reduced Self-Normalized Importance Sampling [11.150337082767862]
Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights.
We propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance.
We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds.
arXiv Detail & Related papers (2022-07-13T17:14:10Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.