Related papers: A New Analysis of Differential Privacy's Generalization Guarantees

A New Analysis of Differential Privacy's Generalization Guarantees

URL: http://arxiv.org/abs/1909.03577v2
Date: Tue, 4 Jun 2024 03:08:58 GMT
Title: A New Analysis of Differential Privacy's Generalization Guarantees
Authors: Christopher Jung, Katrina Ligett, Seth Neel, Aaron Roth, Saeed Sharifi-Malvajerdi, Moshe Shenfeld,
Abstract summary: We show that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere.
Score: 11.485744204944627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the posterior distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its posterior expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the posterior distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds, and in particular, avoids the "monitor argument" used to derive high probability bounds in prior work. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds, even though the improvements are in the constants, rather than the asymptotics (which are known to be tight). As we show, our new bounds outperform the naive "sample-splitting" baseline at dramatically smaller dataset sizes compared to the previous state of the art, bringing techniques from this literature closer to practicality.

Related papers

Tight Lower Bounds and Improved Convergence in Performative Prediction [29.169972807928]
We extend the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous snapshots. We introduce a new upper bound for methods that use only the final iteration of the dataset. We empirically observe faster convergence to the stable point on various performative prediction benchmarks.
arXiv Detail & Related papers (2024-12-04T19:06:19Z)
Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution. Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z)
Predicting generalization performance with correctness discriminators [64.00420578048855]
We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data. We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds.
arXiv Detail & Related papers (2023-11-15T22:43:42Z)
Reproducible Parameter Inference Using Bagged Posteriors [9.975422461924705]
Under model misspecification, it is known that Bayesian posteriors often do not properly quantify uncertainty about true or pseudo-true parameters. We consider the probability that two confidence sets constructed from independent data sets have nonempty overlap. We show that credible sets from the standard posterior can strongly violate this bound, particularly in high-dimensional settings.
arXiv Detail & Related papers (2023-11-03T16:28:16Z)
Variational Inference with Coverage Guarantees in Simulation-Based Inference [18.818573945984873]
We propose Conformalized Amortized Neural Variational Inference (CANVI) CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor. We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation.
arXiv Detail & Related papers (2023-05-23T17:24:04Z)
Sequential Predictive Two-Sample and Independence Testing [114.4130718687858]
We study the problems of sequential nonparametric two-sample and independence testing. We build upon the principle of (nonparametric) testing by betting.
arXiv Detail & Related papers (2023-04-29T01:30:33Z)
Distribution-Free Finite-Sample Guarantees and Split Conformal Prediction [0.0]
split conformal prediction represents a promising avenue to obtain finite-sample guarantees under minimal distribution-free assumptions. We highlight the connection between split conformal prediction and classical tolerance predictors developed in the 1940s.
arXiv Detail & Related papers (2022-10-26T14:12:24Z)
Robustness Implies Generalization via Data-Dependent Generalization Bounds [24.413499775513145]
This paper proves that robustness implies generalization via data-dependent generalization bounds. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable.
arXiv Detail & Related papers (2022-06-27T17:58:06Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner. We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation. We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.