Decentralized Learning of Tree-Structured Gaussian Graphical Models from
Noisy Data
- URL: http://arxiv.org/abs/2109.10642v1
- Date: Wed, 22 Sep 2021 10:41:18 GMT
- Title: Decentralized Learning of Tree-Structured Gaussian Graphical Models from
Noisy Data
- Authors: Akram Hussain
- Abstract summary: This paper studies the decentralized learning of tree-structured Gaussian graphical models (GGMs) from noisy data.
GGMs are widely used to model complex networks such as gene regulatory networks and social networks.
The proposed decentralized learning uses the Chow-Liu algorithm for estimating the tree-structured GGM.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper studies the decentralized learning of tree-structured Gaussian
graphical models (GGMs) from noisy data. In decentralized learning, data set is
distributed across different machines (sensors), and GGMs are widely used to
model complex networks such as gene regulatory networks and social networks.
The proposed decentralized learning uses the Chow-Liu algorithm for estimating
the tree-structured GGM.
In previous works, upper bounds on the probability of incorrect tree
structure recovery were given mostly without any practical noise for
simplification. While this paper investigates the effects of three common types
of noisy channels: Gaussian, Erasure, and binary symmetric channel. For
Gaussian channel case, to satisfy the failure probability upper bound $\delta >
0$ in recovering a $d$-node tree structure, our proposed theorem requires only
$\mathcal{O}(\log(\frac{d}{\delta}))$ samples for the smallest sample size
($n$) comparing to the previous literature \cite{Nikolakakis} with
$\mathcal{O}(\log^4(\frac{d}{\delta}))$ samples by using the positive
correlation coefficient assumption that is used in some important works in the
literature. Moreover, the approximately bounded Gaussian random variable
assumption does not appear in \cite{Nikolakakis}. Given some knowledge about
the tree structure, the proposed Algorithmic Bound will achieve obviously
better performance with small sample size (e.g., $< 2000$) comparing with
formulaic bounds. Finally, we validate our theoretical results by performing
simulations on synthetic data sets.
Related papers
- Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information [1.7419682548187605]
We develop a conditional mutual information tester for Gaussian random variables.
We show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information.
We also show that when the underlying Gaussian model is not known to be tree-structured, $widetildeTheta(n2varepsilon-2)$ samples are necessary.
arXiv Detail & Related papers (2024-11-18T12:25:34Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Convergence for score-based generative modeling with polynomial
complexity [9.953088581242845]
We prove the first convergence guarantees for the core mechanic behind Score-based generative modeling.
Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality.
We show that a predictor-corrector gives better convergence than using either portion alone.
arXiv Detail & Related papers (2022-06-13T14:57:35Z) - uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks.
Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting.
We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.