Stability is Stable: Connections between Replicability, Privacy, and
Adaptive Generalization
- URL: http://arxiv.org/abs/2303.12921v2
- Date: Sat, 25 Mar 2023 03:12:34 GMT
- Title: Stability is Stable: Connections between Replicability, Privacy, and
Adaptive Generalization
- Authors: Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei,
Toniann Pitassi, Satchit Sivakumar, Jessica Sorrell
- Abstract summary: A replicable algorithm gives the same output with high probability when its randomness is fixed.
Using replicable algorithms for data analysis can facilitate the verification of published results.
We establish new connections and separations between replicability and standard notions of algorithmic stability.
- Score: 26.4468964378511
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The notion of replicable algorithms was introduced in Impagliazzo et al.
[STOC '22] to describe randomized algorithms that are stable under the
resampling of their inputs. More precisely, a replicable algorithm gives the
same output with high probability when its randomness is fixed and it is run on
a new i.i.d. sample drawn from the same distribution. Using replicable
algorithms for data analysis can facilitate the verification of published
results by ensuring that the results of an analysis will be the same with high
probability, even when that analysis is performed on a new data set.
In this work, we establish new connections and separations between
replicability and standard notions of algorithmic stability. In particular, we
give sample-efficient algorithmic reductions between perfect generalization,
approximate differential privacy, and replicability for a broad class of
statistical problems. Conversely, we show any such equivalence must break down
computationally: there exist statistical problems that are easy under
differential privacy, but that cannot be solved replicably without breaking
public-key cryptography. Furthermore, these results are tight: our reductions
are statistically optimal, and we show that any computational separation
between DP and replicability must imply the existence of one-way functions.
Our statistical reductions give a new algorithmic framework for translating
between notions of stability, which we instantiate to answer several open
questions in replicability and privacy. This includes giving sample-efficient
replicable algorithms for various PAC learning, distribution estimation, and
distribution testing problems, algorithmic amplification of $\delta$ in
approximate DP, conversions from item-level to user-level privacy, and the
existence of private agnostic-to-realizable learning reductions under
structured distributions.
Related papers
- Replicability in High Dimensional Statistics [18.543059748500358]
We study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks.
Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetrics.
arXiv Detail & Related papers (2024-06-04T00:06:42Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - On Correlation Detection and Alignment Recovery of Gaussian Databases [5.33024001730262]
Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated.
We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector.
When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases.
arXiv Detail & Related papers (2022-11-02T12:01:42Z) - Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean
Estimation [8.9598796481325]
We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems.
We establish an information-computation gap for private sparse mean estimation.
We also give evidence for privacy-induced information-computation gaps for several other statistics and learning problems.
arXiv Detail & Related papers (2022-11-01T20:03:41Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Statistically Guided Divide-and-Conquer for Sparse Factorization of
Large Matrix [2.345015036605934]
We formulate the statistical problem as a sparse factor regression and tackle it with a divide-conquer approach.
In the first stage division, we consider both latent parallel approaches for simplifying the task into a set of co-parsesparserank estimation (CURE) problems.
In the second stage division, we innovate a stagewise learning technique, consisting of a sequence simple incremental paths, to efficiently trace out the whole solution of CURE.
arXiv Detail & Related papers (2020-03-17T19:12:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.