Independence Testing for Bounded Degree Bayesian Network
- URL: http://arxiv.org/abs/2204.08690v1
- Date: Tue, 19 Apr 2022 06:16:14 GMT
- Title: Independence Testing for Bounded Degree Bayesian Network
- Authors: Arnab Bhattacharyya, Cl\'ement L. Canonne, and Joy Qiping Yang
- Abstract summary: We show that if $P$ has a sparse structure, then in fact only linearly many samples are required.
We also show that if $P$ is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by $d$, then $tildeTheta (2d/2cdot n/varepsilon2)$ samples are necessary.
- Score: 4.230271396864461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the following independence testing problem: given access to samples
from a distribution $P$ over $\{0,1\}^n$, decide whether $P$ is a product
distribution or whether it is $\varepsilon$-far in total variation distance
from any product distribution. For arbitrary distributions, this problem
requires $\exp(n)$ samples. We show in this work that if $P$ has a sparse
structure, then in fact only linearly many samples are required. Specifically,
if $P$ is Markov with respect to a Bayesian network whose underlying DAG has
in-degree bounded by $d$, then $\tilde{\Theta}(2^{d/2}\cdot n/\varepsilon^2)$
samples are necessary and sufficient for independence testing.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$.
The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z) - Testing with Non-identically Distributed Samples [20.74768558932617]
We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed.
Even with $Theta(k/varepsilon2)$ samples from each distribution, $textbfp_mathrmavg$ is sufficient to learn $textbfp_mathrmavg$ to within error $varepsilon$ in TV distance.
arXiv Detail & Related papers (2023-11-19T01:25:50Z) - Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Near-Optimal Bounds for Testing Histogram Distributions [35.18069719489173]
We show that the histogram testing problem has sample complexity $widetilde Theta (sqrtnk / varepsilon + k / varepsilon2 + sqrtn / varepsilon2)$.
arXiv Detail & Related papers (2022-07-14T01:24:01Z) - The Price of Tolerance in Distribution Testing [31.10049510641336]
We show the sample complexity to be [fracsqrtnvarepsilon2 + fracnlog n cdotmaxleftfracvarepsilon2, providing a smooth tradeoff between the two previously known cases.
We also provide a similar characterization for the problem of tolerant equivalence testing, where both $p$ and $q$ are unknown.
arXiv Detail & Related papers (2021-06-25T03:59:42Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z) - Sample Amplification: Increasing Dataset Size even when Learning is Impossible [15.864702679819544]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$?
We formalize this question as follows: an $left(n, n + Theta(fracnsqrtk)right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Theta(d)$ samples.
arXiv Detail & Related papers (2019-04-26T21:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.