Communication-constrained hypothesis testing: Optimality, robustness,
and reverse data processing inequalities
- URL: http://arxiv.org/abs/2206.02765v2
- Date: Fri, 15 Dec 2023 22:27:39 GMT
- Title: Communication-constrained hypothesis testing: Optimality, robustness,
and reverse data processing inequalities
- Authors: Ankit Pensia, Varun Jog, Po-Ling Loh
- Abstract summary: We show that the sample complexity of simple binary hypothesis testing under communication constraints is at most a logarithmic factor larger than in the unconstrained setting.
Our framework extends to robust hypothesis testing, where the distributions are corrupted in the total variation distance.
- Score: 8.010966370223985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study hypothesis testing under communication constraints, where each
sample is quantized before being revealed to a statistician. Without
communication constraints, it is well known that the sample complexity of
simple binary hypothesis testing is characterized by the Hellinger distance
between the distributions. We show that the sample complexity of simple binary
hypothesis testing under communication constraints is at most a logarithmic
factor larger than in the unconstrained setting and this bound is tight. We
develop a polynomial-time algorithm that achieves the aforementioned sample
complexity. Our framework extends to robust hypothesis testing, where the
distributions are corrupted in the total variation distance. Our proofs rely on
a new reverse data processing inequality and a reverse Markov inequality, which
may be of independent interest. For simple $M$-ary hypothesis testing, the
sample complexity in the absence of communication constraints has a logarithmic
dependence on $M$. We show that communication constraints can cause an
exponential blow-up leading to $\Omega(M)$ sample complexity even for adaptive
algorithms.
Related papers
- Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach.
Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z) - Optimal Algorithms for Augmented Testing of Discrete Distributions [25.818433126197036]
We show that a predictor can indeed reduce the number of samples required for all three property testing tasks.
A key advantage of our algorithms is their adaptability to the precision of the prediction.
We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal.
arXiv Detail & Related papers (2024-12-01T21:31:22Z) - Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint.
Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods.
We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z) - Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems.
We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z) - Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests [5.3595271893779906]
Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations.
We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
arXiv Detail & Related papers (2024-06-10T19:25:19Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Non-Stochastic CDF Estimation Using Threshold Queries [3.6576781735746513]
We tackle the problem of estimating an empirical distribution in a setting with two challenges.
First, the algorithm does not directly observe the data; instead, it only asks a limited number of threshold queries about each sample.
Second, the data are not assumed to be independent and identically distributed; instead, we allow for an arbitrary process generating the samples.
arXiv Detail & Related papers (2023-01-13T18:00:57Z) - Simple Binary Hypothesis Testing under Local Differential Privacy and
Communication Constraints [8.261182037130407]
We study simple binary hypothesis testing under both local differential privacy (LDP) and communication constraints.
We qualify our results as either minimax optimal or instance optimal.
arXiv Detail & Related papers (2023-01-09T18:36:49Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.