The Sample Complexity of Distributed Simple Binary Hypothesis Testing under Information Constraints
- URL: http://arxiv.org/abs/2506.13686v1
- Date: Mon, 16 Jun 2025 16:50:27 GMT
- Title: The Sample Complexity of Distributed Simple Binary Hypothesis Testing under Information Constraints
- Authors: Hadi Kazemi, Ankit Pensia, Varun Jog,
- Abstract summary: We show that sequential interaction does not help reduce the sample complexity of distributed simple binary hypothesis testing.<n>The second problem suggests tightening existing sample complexity bounds for communication-constrained simple binary hypothesis testing.
- Score: 7.10734752120062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper resolves two open problems from a recent paper, arXiv:2403.16981, concerning the sample complexity of distributed simple binary hypothesis testing under information constraints. The first open problem asks whether interaction reduces the sample complexity of distributed simple binary hypothesis testing. In this paper, we show that sequential interaction does not help. The second problem suggests tightening existing sample complexity bounds for communication-constrained simple binary hypothesis testing. We derive optimally tight bounds for this setting and resolve this problem. Our main technical contributions are: (i) a one-shot lower bound on the Bayes error in simple binary hypothesis testing that satisfies a crucial tensorisation property; (ii) a streamlined proof of the formula for the sample complexity of simple binary hypothesis testing without constraints, first established in arXiv:2403.16981; and (iii) a reverse data-processing inequality for Hellinger-$\lambda$ divergences, generalising the results from arXiv:1812.03031 and arXiv:2206.02765.
Related papers
- Replicable Distribution Testing [38.76577965182225]
Given independent samples, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions.<n>On the algorithmic front, we develop new algorithms for testing closeness and independence of discrete distributions.<n>On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing.
arXiv Detail & Related papers (2025-07-03T17:27:11Z) - The Sample Complexity of Simple Binary Hypothesis Testing [7.127829790714167]
The sample complexity of simple binary hypothesis testing is the smallest number of i.i.d. samples required to distinguish between two distributions $p$ and $q$ in either setting.<n>We derive a formula that characterizes the sample complexity (up to multiplicative constants that are independent of $p$, $q$, and all error parameters) for: (i) all $0 le alpha, beta le 1/8$ in the prior-free setting; and (ii) all $delta le pi/4$ in the Bayesian setting.
arXiv Detail & Related papers (2024-03-25T17:42:32Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Precise Error Rates for Computationally Efficient Testing [67.30044609837749]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.<n>An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - A Primal-Dual Approach to Solving Variational Inequalities with General Constraints [54.62996442406718]
Yang et al. (2023) recently showed how to use first-order gradient methods to solve general variational inequalities.
We prove the convergence of this method and show that the gap function of the last iterate of the method decreases at a rate of $O(frac1sqrtK)$ when the operator is $L$-Lipschitz and monotone.
arXiv Detail & Related papers (2022-10-27T17:59:09Z) - Communication-constrained hypothesis testing: Optimality, robustness,
and reverse data processing inequalities [8.010966370223985]
We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting.
Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance.
arXiv Detail & Related papers (2022-06-06T17:42:11Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - A tetrachotomy of ontology-mediated queries with a covering axiom [1.749935196721634]
Our concern is the problem of efficiently determining the data complexity of answering queries mediated by description and their optimal rewritings to standard database queries.
We focus on Boolean conjunctive-mediated queries called disjunctive sirups (or d-sirups)
Some d-sirups only have exponential-size resolution features, some only double-exponential-size positive existential existential-rewritings and single-exprecursive datalog rewritings.
arXiv Detail & Related papers (2020-06-07T14:47:07Z) - Counterexamples to the Low-Degree Conjecture [80.3668228845075]
A conjecture of Hopkins posits that for certain high-dimensional hypothesis testing problems, no-time algorithm can outperform so-called "simple statistics"
This conjecture formalizes the beliefs surrounding a line of recent work that seeks to understand statistical-versus-computational tradeoffs.
arXiv Detail & Related papers (2020-04-17T21:08:11Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.