Replicability and stability in learning
- URL: http://arxiv.org/abs/2304.03757v2
- Date: Wed, 12 Apr 2023 17:10:27 GMT
- Title: Replicability and stability in learning
- Authors: Zachary Chase, Shay Moran, Amir Yehudayoff
- Abstract summary: Impagliazzo, Lei, Pitassi and Sorrell (22) recently initiated the study of replicability in machine learning.
We show how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1.
We prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1.
- Score: 16.936594801109557
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Replicability is essential in science as it allows us to validate and verify
research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently
initiated the study of replicability in machine learning. A learning algorithm
is replicable if it typically produces the same output when applied on two
i.i.d. inputs using the same internal randomness. We study a variant of
replicability that does not involve fixing the randomness. An algorithm
satisfies this form of replicability if it typically produces the same output
when applied on two i.i.d. inputs (without fixing the internal randomness).
This variant is called global stability and was introduced by Bun, Livni and
Moran ('20) in the context of differential privacy.
Impagliazzo et al. showed how to boost any replicable algorithm so that it
produces the same output with probability arbitrarily close to 1. In contrast,
we demonstrate that for numerous learning tasks, global stability can only be
accomplished weakly, where the same output is produced only with probability
bounded away from 1. To overcome this limitation, we introduce the concept of
list replicability, which is equivalent to global stability. Moreover, we prove
that list replicability can be boosted so that it is achieved with probability
arbitrarily close to 1. We also describe basic relations between standard
learning-theoretic complexity measures and list replicable numbers. Our
results, in addition, imply that besides trivial cases, replicable algorithms
(in the sense of Impagliazzo et al.) must be randomized.
The proof of the impossibility result is based on a topological fixed-point
theorem. For every algorithm, we are able to locate a "hard input distribution"
by applying the Poincar\'{e}-Miranda theorem in a related topological setting.
The equivalence between global stability and list replicability is algorithmic.
Related papers
- Replicability in High Dimensional Statistics [18.543059748500358]
We study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks.
Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetrics.
arXiv Detail & Related papers (2024-06-04T00:06:42Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Batch Bayesian Optimization for Replicable Experimental Design [56.64902148159355]
Many real-world design problems evaluate multiple experimental conditions in parallel and replicate each condition multiple times due to large and heteroscedastic observation noise.
We propose the Batch Thompson Sampling for Replicable Experimental Design framework, which encompasses three algorithms.
We show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
arXiv Detail & Related papers (2023-11-02T12:46:03Z) - Replicable Reinforcement Learning [15.857503103543308]
We provide a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting.
These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
arXiv Detail & Related papers (2023-05-24T16:05:15Z) - List and Certificate Complexities in Replicable Learning [0.7829352305480285]
We consider two feasible notions of replicability called list replicability and certificate replicability.
We design algorithms for certain learning problems that are optimal in list and certificate complexity.
arXiv Detail & Related papers (2023-04-05T06:05:27Z) - Stability is Stable: Connections between Replicability, Privacy, and
Adaptive Generalization [26.4468964378511]
A replicable algorithm gives the same output with high probability when its randomness is fixed.
Using replicable algorithms for data analysis can facilitate the verification of published results.
We establish new connections and separations between replicability and standard notions of algorithmic stability.
arXiv Detail & Related papers (2023-03-22T21:35:50Z) - $\omega$PAP Spaces: Reasoning Denotationally About Higher-Order,
Recursive Probabilistic and Differentiable Programs [64.25762042361839]
$omega$PAP spaces are spaces for reasoning denotationally about expressive differentiable and probabilistic programming languages.
Our semantics is general enough to assign meanings to most practical probabilistic and differentiable programs.
We establish the almost-everywhere differentiability of probabilistic programs' trace density functions.
arXiv Detail & Related papers (2023-02-21T12:50:05Z) - Replicable Clustering [57.19013971737493]
We propose algorithms for the statistical $k$-medians, statistical $k$-means, and statistical $k$-centers problems by utilizing approximation routines for their counterparts in a black-box manner.
We also provide experiments on synthetic distributions in 2D using the $k$-means++ implementation from sklearn as a black-box that validate our theoretical results.
arXiv Detail & Related papers (2023-02-20T23:29:43Z) - Reproducibility in Learning [8.386806623480156]
A reproducible learning algorithm is resilient to variations in its samples.
Despite strong demand, there are efficient reproducible algorithms for several fundamental problems in statistics and learning.
arXiv Detail & Related papers (2022-01-20T19:59:11Z) - ReLU Regression with Massart Noise [52.10842036932169]
We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data.
We focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model.
We develop an efficient algorithm that achieves exact parameter recovery in this model.
arXiv Detail & Related papers (2021-09-10T02:13:22Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.