Best Linear Unbiased Estimate from Privatized Histograms
- URL: http://arxiv.org/abs/2409.04387v3
- Date: Tue, 29 Oct 2024 15:49:09 GMT
- Title: Best Linear Unbiased Estimate from Privatized Histograms
- Authors: Jordan Awan, Adam Edwards, Paul Bartholomew, Andrew Sillers,
- Abstract summary: In differential privacy (DP) mechanisms, it can be beneficial to release "redundant" outputs.
We show that the minimum variance processing is a linear projection.
We propose the Scalable Algorithm Efficient for Best Linear Unbiased Estimate (SEA BLUE)
- Score: 6.17477133700348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In differential privacy (DP) mechanisms, it can be beneficial to release "redundant" outputs, in the sense that a quantity can be estimated by combining different combinations of privatized values. Indeed, this structure is present in the DP 2020 Decennial Census products published by the U.S. Census Bureau. With this structure, the DP output can be improved by enforcing self-consistency (i.e., estimators obtained by combining different values result in the same estimate) and we show that the minimum variance processing is a linear projection. However, standard projection algorithms are too computationally expensive in terms of both memory and execution time for applications such as the Decennial Census. We propose the Scalable Efficient Algorithm for Best Linear Unbiased Estimate (SEA BLUE), based on a two step process of aggregation and differencing that 1) enforces self-consistency through a linear and unbiased procedure, 2) is computationally and memory efficient, 3) achieves the minimum variance solution under certain structural assumptions, and 4) is empirically shown to be robust to violations of these structural assumptions. We propose three methods of calculating confidence intervals from our estimates, under various assumptions. We apply SEA BLUE to two 2010 Census demonstration products, illustrating its scalability and validity.
Related papers
- The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data [10.893644207618825]
We present a measurement study of four discretization strategies in the context of Differentially Private (DP) generative marginal models.
We find that optimizing both the choice of discretizer and bin count can improve utility, on average, by almost 30% across six DP marginal models.
arXiv Detail & Related papers (2025-04-09T14:30:30Z) - Data value estimation on private gradients [84.966853523107]
For gradient-based machine learning (ML) methods, the de facto differential privacy technique is perturbing the gradients with random noise.
Data valuation attributes the ML performance to the training data and is widely used in privacy-aware applications that require enforcing DP.
We show that the answer is no with the default approach of injecting i.i.d.random noise to the gradients because the estimation uncertainty of the data value estimation paradoxically linearly scales with more estimation budget.
We propose to instead inject carefully correlated noise to provably remove the linear scaling of estimation uncertainty w.r.t.the budget.
arXiv Detail & Related papers (2024-12-22T13:15:51Z) - Full-Information Estimation For Hierarchical Data [0.43512163406552007]
The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements.
These noisy measurements are observed in a set of hierarchical geographic units, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks.
This paper describes a method to leverage the hierarchical structure within these noisy measurements to compute confidence intervals for arbitrary tabulations.
arXiv Detail & Related papers (2024-04-19T20:18:16Z) - Online Estimation via Offline Estimation: An Information-Theoretic Framework [75.80823630681323]
Motivated by connections between estimation and interactive decision making, we ask: is it possible to convert offline estimation algorithms into online estimation algorithms?
We introduce a new framework, Oracle-Efficient Online Estimation (OEOE), where the learner can only interact with the data stream indirectly through a sequence of offline estimators produced by a black-box algorithm operating on the stream.
arXiv Detail & Related papers (2024-04-15T20:19:18Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Bounded and Unbiased Composite Differential Privacy [25.427802467876248]
The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between two neighboring databases.
Existing solutions attempt to address this issue by employing post-processing or truncation techniques.
We propose a novel differentially private mechanism which uses a composite probability density function to generate bounded and unbiased outputs.
arXiv Detail & Related papers (2023-11-04T04:43:47Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Differentially Private Estimation via Statistical Depth [0.0]
Two notions of statistical depth are used to motivate new approximate DP location and regression estimators.
To avoid requiring that users specify a priori bounds on the estimates and/or the observations, variants of these DP mechanisms are described.
arXiv Detail & Related papers (2022-07-26T01:59:07Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Adaptive Estimation and Uniform Confidence Bands for Nonparametric
Structural Functions and Elasticities [2.07706336594149]
We introduce two data-driven procedures for optimal estimation and inference in nonparametric models.
We estimate the elasticity of the intensive margin of firm exports in a monopolistic competition model of international trade.
arXiv Detail & Related papers (2021-07-25T18:46:33Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Online Covariance Matrix Estimation in Stochastic Gradient Descent [10.153224593032677]
gradient descent (SGD) is widely used for parameter estimation especially for huge data sets and online learning.
This paper aims at quantifying statistical inference of SGD-based estimates in an online setting.
arXiv Detail & Related papers (2020-02-10T17:46:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.