Replicable Distribution Testing
- URL: http://arxiv.org/abs/2507.02814v1
- Date: Thu, 03 Jul 2025 17:27:11 GMT
- Title: Replicable Distribution Testing
- Authors: Ilias Diakonikolas, Jingyi Gao, Daniel Kane, Sihan Liu, Christopher Ye,
- Abstract summary: Given independent samples, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions.<n>On the algorithmic front, we develop new algorithms for testing closeness and independence of discrete distributions.<n>On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing.
- Score: 38.76577965182225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing -- answering an open question from prior work -- and closeness testing.
Related papers
- On the Structure of Replicable Hypothesis Testers [19.10307834463581]
A hypothesis testing algorithm is replicable if, when run on two different samples from the same distribution, it produces the same output with high probability.<n>We build general tools to prove lower and upper bounds on the sample complexity of replicable testers.<n>We identify a set of canonical properties, and prove that any replicable testing algorithm can be modified to satisfy these properties.
arXiv Detail & Related papers (2025-07-03T17:51:31Z) - Optimal Algorithms for Augmented Testing of Discrete Distributions [25.818433126197036]
We show that a predictor can indeed reduce the number of samples required for all three property testing tasks.<n>A key advantage of our algorithms is their adaptability to the precision of the prediction.<n>We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
arXiv Detail & Related papers (2024-12-01T21:31:22Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Stability is Stable: Connections between Replicability, Privacy, and
Adaptive Generalization [26.4468964378511]
A replicable algorithm gives the same output with high probability when its randomness is fixed.
Using replicable algorithms for data analysis can facilitate the verification of published results.
We establish new connections and separations between replicability and standard notions of algorithmic stability.
arXiv Detail & Related papers (2023-03-22T21:35:50Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Communication-constrained hypothesis testing: Optimality, robustness,
and reverse data processing inequalities [8.010966370223985]
We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting.
Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance.
arXiv Detail & Related papers (2022-06-06T17:42:11Z) - Group Testing with Non-identical Infection Probabilities [59.96266198512243]
We develop an adaptive group testing algorithm using the set formation method.
We show that our algorithm outperforms the state of the art, and performs close to the entropy lower bound.
arXiv Detail & Related papers (2021-08-27T17:53:25Z) - Revisiting the Sample Complexity of Sparse Spectrum Approximation of
Gaussian Processes [60.479499225746295]
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.
Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs)
arXiv Detail & Related papers (2020-11-17T05:41:50Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.