Sample-Optimal and Efficient Learning of Tree Ising models
- URL: http://arxiv.org/abs/2010.14864v2
- Date: Sun, 29 Nov 2020 22:50:21 GMT
- Title: Sample-Optimal and Efficient Learning of Tree Ising models
- Authors: Constantinos Daskalakis and Qinxuan Pan
- Abstract summary: We show that $n$-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance $epsilon$ from an optimal $O(n ln n/epsilon2)$ samples.
Our guarantees do not follow from known results for the Chow-Liu algorithm.
- Score: 24.201827888085944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that $n$-variable tree-structured Ising models can be learned
computationally-efficiently to within total variation distance $\epsilon$ from
an optimal $O(n \ln n/\epsilon^2)$ samples, where $O(\cdot)$ hides an absolute
constant which, importantly, does not depend on the model being learned -
neither its tree nor the magnitude of its edge strengths, on which we place no
assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu [1968]
algorithm, using the plug-in estimator for estimating mutual information. While
this (or any other) algorithm may fail to identify the structure of the
underlying model correctly from a finite sample, we show that it will still
learn a tree-structured model that is $\epsilon$-close to the true one in total
variation distance, a guarantee called "proper learning."
Our guarantees do not follow from known results for the Chow-Liu algorithm
and the ensuing literature on learning graphical models, including a recent
renaissance of algorithms on this learning challenge, which only yield
asymptotic consistency results, or sample-inefficient and/or time-inefficient
algorithms, unless further assumptions are placed on the graphical model, such
as bounds on the "strengths" of the model's edges/hyperedges. While we
establish guarantees for a widely known and simple algorithm, the analysis that
this algorithm succeeds and is sample-optimal is quite complex, requiring a
hierarchical classification of the edges into layers with different
reconstruction guarantees, depending on their strength, combined with delicate
uses of the subadditivity of the squared Hellinger distance over graphical
models to control the error accumulation.
Related papers
- Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models [6.54203362045253]
We study the problem of learning directed acyclic graphs from continuous observational data.
We develop a mixed-integer programming framework for learning medium-sized problems.
Our method outperforms state-of-the-art algorithms and is robust to noise heteroscedasticity.
arXiv Detail & Related papers (2024-04-19T02:42:13Z) - Learning bounded-degree polytrees with known skeleton [15.137372516678143]
We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees.
We provide an efficient algorithm which learns $d$-polytrees in time and sample complexity for any bounded $d$ when the underlying undirected graph is known.
arXiv Detail & Related papers (2023-10-10T06:03:51Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Chow-Liu++: Optimal Prediction-Centric Learning of Tree Ising Models [30.62276301751904]
We introduce a new algorithm that combines elements of the Chow-Liu algorithm with tree metric reconstruction methods.
Our algorithm is robust to model misspecification and adversarial corruptions.
arXiv Detail & Related papers (2021-06-07T21:09:29Z) - Outlier-Robust Learning of Ising Models Under Dobrushin's Condition [57.89518300699042]
We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted.
Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees.
arXiv Detail & Related papers (2021-02-03T18:00:57Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.