Related papers: Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization

URL: http://arxiv.org/abs/2303.12921v2
Date: Sat, 25 Mar 2023 03:12:34 GMT
Title: Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization
Authors: Mark Bun, Marco Gaboardi, Max Hopkins, Russell Impagliazzo, Rex Lei, Toniann Pitassi, Satchit Sivakumar, Jessica Sorrell
Abstract summary: A replicable algorithm gives the same output with high probability when its randomness is fixed. Using replicable algorithms for data analysis can facilitate the verification of published results. We establish new connections and separations between replicability and standard notions of algorithmic stability.
Score: 26.4468964378511
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The notion of replicable algorithms was introduced in Impagliazzo et al. [STOC '22] to describe randomized algorithms that are stable under the resampling of their inputs. More precisely, a replicable algorithm gives the same output with high probability when its randomness is fixed and it is run on a new i.i.d. sample drawn from the same distribution. Using replicable algorithms for data analysis can facilitate the verification of published results by ensuring that the results of an analysis will be the same with high probability, even when that analysis is performed on a new data set. In this work, we establish new connections and separations between replicability and standard notions of algorithmic stability. In particular, we give sample-efficient algorithmic reductions between perfect generalization, approximate differential privacy, and replicability for a broad class of statistical problems. Conversely, we show any such equivalence must break down computationally: there exist statistical problems that are easy under differential privacy, but that cannot be solved replicably without breaking public-key cryptography. Furthermore, these results are tight: our reductions are statistically optimal, and we show that any computational separation between DP and replicability must imply the existence of one-way functions. Our statistical reductions give a new algorithmic framework for translating between notions of stability, which we instantiate to answer several open questions in replicability and privacy. This includes giving sample-efficient replicable algorithms for various PAC learning, distribution estimation, and distribution testing problems, algorithmic amplification of $\delta$ in approximate DP, conversions from item-level to user-level privacy, and the existence of private agnostic-to-realizable learning reductions under structured distributions.

Related papers

Statistical Inference for Misspecified Contextual Bandits [6.178061357164435]
Contextual bandit algorithms have transformed modern experimentation by enabling real-time adaptation for personalized treatment.<n>Yet these advantages create challenges for statistical inference due to adaptivity.<n> Convergence ensures replicability of adaptive experiments and stability of online algorithms.
arXiv Detail & Related papers (2025-09-08T02:19:37Z)
On the Structure of Replicable Hypothesis Testers [19.10307834463581]
A hypothesis testing algorithm is replicable if, when run on two different samples from the same distribution, it produces the same output with high probability.<n>We build general tools to prove lower and upper bounds on the sample complexity of replicable testers.<n>We identify a set of canonical properties, and prove that any replicable testing algorithm can be modified to satisfy these properties.
arXiv Detail & Related papers (2025-07-03T17:51:31Z)
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget [55.938644481736446]
We introduce a novel algorithm for best feasible arm identification that guarantees an exponential decay in the error probability.<n>We validate our algorithm through comprehensive empirical evaluations across various problem instances with different levels of complexity.
arXiv Detail & Related papers (2025-06-03T02:56:26Z)
Sample-Optimal Private Regression in Polynomial Time [3.3748750222488657]
We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers.
arXiv Detail & Related papers (2025-03-31T17:08:12Z)
Fairness with Exponential Weights [4.368185344922342]
Motivated by the need to remove discrimination in certain applications, we develop a meta-algorithm that can convert any efficient implementation of Hedge into an efficient for the equivalent contextual bandit problem. Relative to any algorithm with statistical parity, the resulting algorithm has the same regret bound as running the corresponding instance of Exp4 for each protected characteristic independently.
arXiv Detail & Related papers (2024-11-06T22:25:56Z)
Replicability in High Dimensional Statistics [18.543059748500358]
We study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks. Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetrics.
arXiv Detail & Related papers (2024-06-04T00:06:42Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
On Correlation Detection and Alignment Recovery of Gaussian Databases [5.33024001730262]
Correlation detection is a hypothesis testing problem; under the null hypothesis, the databases are independent, and under the alternate hypothesis, they are correlated. We develop bounds on the type-I and type-II error probabilities, and show that the analyzed detector performs better than a recently proposed detector. When the databases are accepted as correlated, the algorithm also recovers some partial alignment between the given databases.
arXiv Detail & Related papers (2022-11-02T12:01:42Z)
Privacy Induces Robustness: Information-Computation Gaps and Sparse Mean Estimation [8.9598796481325]
We investigate the consequences of this observation for both algorithms and computational complexity across different statistical problems. We establish an information-computation gap for private sparse mean estimation. We also give evidence for privacy-induced information-computation gaps for several other statistics and learning problems.
arXiv Detail & Related papers (2022-11-01T20:03:41Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix [2.345015036605934]
We formulate the statistical problem as a sparse factor regression and tackle it with a divide-conquer approach. In the first stage division, we consider both latent parallel approaches for simplifying the task into a set of co-parsesparserank estimation (CURE) problems. In the second stage division, we innovate a stagewise learning technique, consisting of a sequence simple incremental paths, to efficiently trace out the whole solution of CURE.
arXiv Detail & Related papers (2020-03-17T19:12:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.