Related papers: Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap

Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap

URL: http://arxiv.org/abs/2507.10746v1
Date: Mon, 14 Jul 2025 19:12:16 GMT
Title: Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap
Authors: Zhanyu Wang, Arin Chang, Jordan Awan,
Abstract summary: Existing usage of the parametric bootstrap on privatized data ignored or avoided handling the effect of clamping.<n>We propose using the indirect inference method to estimate the parameter values consistently.<n>Our framework produces confidence intervals with well-calibrated coverage and performs hypothesis testing with the correct type I error.
Score: 12.65121513620053
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We design a debiased parametric bootstrap framework for statistical inference from differentially private data. Existing usage of the parametric bootstrap on privatized data ignored or avoided handling the effect of clamping, a technique employed by the majority of privacy mechanisms. Ignoring the impact of clamping often leads to under-coverage of confidence intervals and miscalibrated type I errors of hypothesis tests. The main reason for the failure of the existing methods is the inconsistency of the parameter estimate based on the privatized data. We propose using the indirect inference method to estimate the parameter values consistently, and we use the improved estimator in parametric bootstrap for inference. To implement the indirect estimator, we present a novel simulation-based, adaptive approach along with the theory that establishes the consistency of the corresponding parametric bootstrap estimates, confidence intervals, and hypothesis tests. In particular, we prove that our adaptive indirect estimator achieves the minimum asymptotic variance among all "well-behaved" consistent estimators based on the released summary statistic. Our simulation studies show that our framework produces confidence intervals with well-calibrated coverage and performs hypothesis testing with the correct type I error, giving state-of-the-art performance for inference on location-scale normals, simple linear regression, and logistic regression.

Related papers

Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks [1.4671424999873808]
Uncertainty is critical in safety-sensitive applications but is often omitted from off-the-shelf neural networks due to adverse effects on predictive performance.<n>We propose a theoretically grounded framework for post-hoc uncertainty estimation in regression tasks by fitting an auxiliary model to both original inputs and frozen model outputs.
arXiv Detail & Related papers (2025-06-01T09:13:27Z)
Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.<n>The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.<n>The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z)
Resampling methods for private statistical inference [1.8110941972682346]
We consider the task of constructing confidence intervals with differential privacy. We propose two private variants of the non-parametric bootstrap, which privately compute the median of the results of multiple "little" bootstraps run on partitions of the data. For a fixed differential privacy parameter $epsilon$, our methods enjoy the same error rates as that of the non-private bootstrap to within logarithmic factors in the sample size $n$.
arXiv Detail & Related papers (2024-02-11T08:59:02Z)
Finite Sample Confidence Regions for Linear Regression Parameters Using Arbitrary Predictors [1.6860963320038902]
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor. The derived confidence regions can be cast as constraints within a Mixed Linear Programming framework, enabling optimisation of linear objectives. Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing.
arXiv Detail & Related papers (2024-01-27T00:15:48Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Simulation-based, Finite-sample Inference for Privatized Data [14.218697973204065]
We propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests. We show that this methodology is applicable to a wide variety of private inference problems.
arXiv Detail & Related papers (2023-03-09T15:19:31Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
Robust Output Analysis with Monte-Carlo Methodology [0.0]
In predictive modeling with simulation or machine learning, it is critical to accurately assess the quality of estimated values. We propose a unified output analysis framework for simulation and machine learning outputs through the lens of Monte Carlo sampling.
arXiv Detail & Related papers (2022-07-27T16:21:59Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Parametric Bootstrap for Differentially Private Confidence Intervals [8.781431682774484]
We develop a practical and general-purpose approach to construct confidence intervals for differentially private parametric estimation. We find that the parametric bootstrap is a simple and effective solution.
arXiv Detail & Related papers (2020-06-14T00:08:19Z)
Machine learning for causal inference: on the use of cross-fit estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties. We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE) When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.