Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu
- URL: http://arxiv.org/abs/2011.04144v2
- Date: Thu, 22 Jul 2021 06:37:12 GMT
- Title: Near-Optimal Learning of Tree-Structured Distributions by Chow-Liu
- Authors: Arnab Bhattacharyya, Sutanu Gayen, Eric Price, N. V. Vinodchandran
- Abstract summary: We provide finite sample guarantees for the classical ChowLiu algorithm (IEEE Trans.Inform.Theory, 1968)
We show that for a specific tree $T$, with $widetildeO (|Sigma|2nvarepsilon-1)$ samples from a distribution $P$ over $Sigman$, one can efficiently learn the closest KL divergence.
- Score: 14.298220510927695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We provide finite sample guarantees for the classical Chow-Liu algorithm
(IEEE Trans.~Inform.~Theory, 1968) to learn a tree-structured graphical model
of a distribution. For a distribution $P$ on $\Sigma^n$ and a tree $T$ on $n$
nodes, we say $T$ is an $\varepsilon$-approximate tree for $P$ if there is a
$T$-structured distribution $Q$ such that $D(P\;||\;Q)$ is at most
$\varepsilon$ more than the best possible tree-structured distribution for $P$.
We show that if $P$ itself is tree-structured, then the Chow-Liu algorithm with
the plug-in estimator for mutual information with $\widetilde{O}(|\Sigma|^3
n\varepsilon^{-1})$ i.i.d.~samples outputs an $\varepsilon$-approximate tree
for $P$ with constant probability. In contrast, for a general $P$ (which may
not be tree-structured), $\Omega(n^2\varepsilon^{-2})$ samples are necessary to
find an $\varepsilon$-approximate tree. Our upper bound is based on a new
conditional independence tester that addresses an open problem posed by
Canonne, Diakonikolas, Kane, and Stewart~(STOC, 2018): we prove that for three
random variables $X,Y,Z$ each over $\Sigma$, testing if $I(X; Y \mid Z)$ is $0$
or $\geq \varepsilon$ is possible with $\widetilde{O}(|\Sigma|^3/\varepsilon)$
samples. Finally, we show that for a specific tree $T$, with $\widetilde{O}
(|\Sigma|^2n\varepsilon^{-1})$ samples from a distribution $P$ over $\Sigma^n$,
one can efficiently learn the closest $T$-structured distribution in KL
divergence by applying the add-1 estimator at each node.
Related papers
- Efficient Sampling of Stochastic Differential Equations with Positive
Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix.
It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$
Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z) - A spectral least-squares-type method for heavy-tailed corrupted
regression with unknown covariance \& heterogeneous noise [2.019622939313173]
We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $epsilon n$ arbitrary outliers.
We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(Sigma,Xi) nor the operator norm of $Xi$.
arXiv Detail & Related papers (2022-09-06T23:37:31Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Tree density estimation [12.831051269764115]
density estimation for a random vector $boldsymbol X$ in $mathbb Rd$ with probability density $f(boldsymbol x)$.
For Lipschitz continuous $f$ with bounded support, $mathbb E int |f_n(boldsymbol x)-fT*(boldsymbol x)|dboldsymbol x=0$ a.s.
For Lipschitz continuous $f$ with bounded support, $mathbb E int |f_n(boldsymbol x)-f
arXiv Detail & Related papers (2021-11-23T16:05:59Z) - Coresets for Decision Trees of Signals [19.537354146654845]
We provide the first algorithm that outputs such a $(k,varepsilon)$-coreset for emphevery such matrix $D$.
This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry.
arXiv Detail & Related papers (2021-10-07T05:49:55Z) - Learning low-degree functions from a logarithmic number of random
queries [77.34726150561087]
We prove that for any integer $ninmathbbN$, $din1,ldots,n$ and any $varepsilon,deltain(0,1)$, a bounded function $f:-1,1nto[-1,1]$ of degree at most $d$ can be learned.
arXiv Detail & Related papers (2021-09-21T13:19:04Z) - Self-training Converts Weak Learners to Strong Learners in Mixture
Models [86.7137362125503]
We show that a pseudolabeler $boldsymbolbeta_mathrmpl$ can achieve classification error at most $C_mathrmerr$.
We additionally show that by running gradient descent on the logistic loss one can obtain a pseudolabeler $boldsymbolbeta_mathrmpl$ with classification error $C_mathrmerr$ using only $O(d)$ labeled examples.
arXiv Detail & Related papers (2021-06-25T17:59:16Z) - Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication
Time [14.990725929840892]
We show an algorithm that runs in time $widetildeO(T(N, d) log kappa / mathrmpoly (varepsilon))$, where $T(N, d)$ is the time it takes to multiply a $d times N$ matrix by its transpose.
Our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors.
arXiv Detail & Related papers (2020-06-23T20:21:27Z) - Efficient Statistics for Sparse Graphical Models from Truncated Samples [19.205541380535397]
We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models.
For sparse linear regression, suppose samples $(bf x,y)$ are generated where $y = bf xtopOmega* + mathcalN(0,1)$ and $(bf x, y)$ is seen only if $y$ belongs to a truncation set $S subseteq mathbbRd$.
arXiv Detail & Related papers (2020-06-17T09:21:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.