Differential privacy with dependent data
- URL: http://arxiv.org/abs/2511.18583v2
- Date: Tue, 25 Nov 2025 03:41:49 GMT
- Title: Differential privacy with dependent data
- Authors: Valentin Roth, Marco Avella-Medina,
- Abstract summary: We show that Winsorized mean estimators can be used under dependence for bounded data.<n>We formalize dependence via log-Sobolev inequalities on the joint unbounded observations.<n>Our work constitutes a first step towards a systematic study of Differential Privacy (DP) for dependent data.
- Score: 1.8835490533310795
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of privacy requirements for processing dependent data where each individual provides multiple observations to the dataset. However, dependence introduced, e.g., through repeated measurements challenges the existing statistical theory under DP-constraints. In \iid{} settings, noisy Winsorized mean estimators have been shown to be minimax optimal for standard (\textit{item-level}) and \textit{user-level} DP estimation of a mean $μ\in \R^d$. Yet, their behavior on potentially dependent observations has not previously been studied. We fill this gap and show that Winsorized mean estimators can also be used under dependence for bounded and unbounded data, and can lead to asymptotic and finite sample guarantees that resemble their \iid{} counterparts under a weak notion of dependence. For this, we formalize dependence via log-Sobolev inequalities on the joint distribution of observations. This enables us to adapt the stable histogram by Karwa and Vadhan (2018) to a non-\iid{} setting, which we then use to estimate the private projection intervals of the Winsorized estimator. The resulting guarantees for our item-level mean estimator extend to \textit{user-level} mean estimation and transfer to the local model via a randomized response histogram. Using the mean estimators as building blocks, we provide extensions to random effects models, longitudinal linear regression and nonparametric regression. Therefore, our work constitutes a first step towards a systematic study of DP for dependent data.
Related papers
- Differentially Private Inference for Longitudinal Linear Regression [9.16331221881594]
We develop a comprehensive framework for estimation and inference in longitudinal linear regression under user-level DP.<n>For inference, we develop a privatized estimator that is automatically heteroskedasticity- and autocorrelation-consistent.<n>These results provide the first unified framework for practical user-level DP estimation and inference.
arXiv Detail & Related papers (2026-01-15T17:47:02Z) - Private Statistical Estimation via Truncation [5.642973820558159]
We introduce a novel framework for differentially private statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded.<n>By leveraging techniques from truncated statistics, we develop computationally efficient DP estimators for exponential family distributions.
arXiv Detail & Related papers (2025-05-18T20:38:38Z) - Assumption-Lean Post-Integrated Inference with Surrogate Control Outcomes [6.448728765953916]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using control outcomes.<n>We develop semiparametric inference on projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.<n>The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation [0.0]
The Gaussian Process (GP) is a flexible non-linear regression approach.
It provides a principled approach to handling our uncertainty over predicted (counterfactual) values.
This is especially valuable under conditions of extrapolation or weak overlap.
arXiv Detail & Related papers (2024-07-15T05:09:50Z) - Semi-supervised Regression Analysis with Model Misspecification and High-dimensional Data [8.619243141968886]
We present an inference framework for estimating regression coefficients in conditional mean models.
We develop an augmented inverse probability weighted (AIPW) method, employing regularized estimators for both propensity score (PS) and outcome regression (OR) models.
Our theoretical findings are verified through extensive simulation studies and a real-world data application.
arXiv Detail & Related papers (2024-06-20T00:34:54Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.<n>We propose a method called Stratified Prediction-Powered Inference (StratPPI)<n>We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Geometry-Aware Instrumental Variable Regression [56.16884466478886]
We propose a transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information.
We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings.
arXiv Detail & Related papers (2024-05-19T17:49:33Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Statistical Estimation from Dependent Data [37.73584699735133]
We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors.
We model these dependencies in the language of Markov Random Fields.
We provide algorithms and statistically efficient estimation rates for this model.
arXiv Detail & Related papers (2021-07-20T21:18:06Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Evaluating Model Robustness and Stability to Dataset Shift [7.369475193451259]
We propose a framework for analyzing stability of machine learning models.
We use the original evaluation data to determine distributions under which the algorithm performs poorly.
We estimate the algorithm's performance on the "worst-case" distribution.
arXiv Detail & Related papers (2020-10-28T17:35:39Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.