Related papers: Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information

Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information

URL: http://arxiv.org/abs/2411.11516v1
Date: Mon, 18 Nov 2024 12:25:34 GMT
Title: Efficient Sample-optimal Learning of Gaussian Tree Models via Sample-optimal Testing of Gaussian Mutual Information
Authors: Sutanu Gayen, Sanket Kale, Sayantan Sen,
Abstract summary: We develop a conditional mutual information tester for Gaussian random variables. We show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information. We also show that when the underlying Gaussian model is not known to be tree-structured, $widetildeTheta(n2varepsilon-2)$ samples are necessary.
Score: 1.7419682548187605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning high-dimensional distributions is a significant challenge in machine learning and statistics. Classical research has mostly concentrated on asymptotic analysis of such data under suitable assumptions. While existing works [Bhattacharyya et al.: SICOMP 2023, Daskalakis et al.: STOC 2021, Choo et al.: ALT 2024] focus on discrete distributions, the current work addresses the tree structure learning problem for Gaussian distributions, providing efficient algorithms with solid theoretical guarantees. This is crucial as real-world distributions are often continuous and differ from the discrete scenarios studied in prior works. In this work, we design a conditional mutual information tester for Gaussian random variables that can test whether two Gaussian random variables are independent, or their conditional mutual information is at least $\varepsilon$, for some parameter $\varepsilon \in (0,1)$ using $\mathcal{O}(\varepsilon^{-1})$ samples which we show to be near-optimal. In contrast, an additive estimation would require $\Omega(\varepsilon^{-2})$ samples. Our upper bound technique uses linear regression on a pair of suitably transformed random variables. Importantly, we show that the chain rule of conditional mutual information continues to hold for the estimated (conditional) mutual information. As an application of such a mutual information tester, we give an efficient $\varepsilon$-approximate structure-learning algorithm for an $n$-variate Gaussian tree model that takes $\widetilde{\Theta}(n\varepsilon^{-1})$ samples which we again show to be near-optimal. In contrast, when the underlying Gaussian model is not known to be tree-structured, we show that $\widetilde{{{\Theta}}}(n^2\varepsilon^{-2})$ samples are necessary and sufficient to output an $\varepsilon$-approximate tree structure. We perform extensive experiments that corroborate our theoretical convergence bounds.

Related papers

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$. For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z)
Nearly Optimal Robust Covariance and Scatter Matrix Estimation Beyond Gaussians [2.311583680973075]
We study the problem of computationally efficient robust estimation of the covariance/scatter matrix of elliptical distributions. We obtain the first efficiently computable, nearly optimal robust covariance estimators that extend beyond the Gaussian case.
arXiv Detail & Related papers (2025-02-10T15:31:57Z)
Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians [7.04316974339151]
We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set. We develop tools that may be of independent interest, including a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples.
arXiv Detail & Related papers (2024-10-02T15:21:07Z)
Convergence Analysis of Probability Flow ODE for Score-based Generative Models [5.939858158928473]
We study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives. We prove the total variation between the target and the generated data distributions can be bounded above by $mathcalO(d3/4delta1/2)$ in the continuous time level.
arXiv Detail & Related papers (2024-04-15T12:29:28Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples [9.649879910148854]
We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP) Our main result is that $textpoly(k,d,1/alpha,1/varepsilon,log (1/delta))$ samples are sufficient to estimate a mixture of $k$ Gaussians in $mathbbRd$ up to total variation distance $alpha$. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs.
arXiv Detail & Related papers (2023-09-07T17:02:32Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models [91.22420505636006]
This paper deals with the problem of efficient sampling from a differential equation, given the drift function and the diffusion matrix. It is possible to obtain independent and identically distributed (i.i.d.) samples at precision $varepsilon$ with a cost that is $m2 d log (1/varepsilon)$ Our results suggest that as the true solution gets smoother, we can circumvent the curse of dimensionality without requiring any sort of convexity.
arXiv Detail & Related papers (2023-03-30T02:50:49Z)
Robust Mean Estimation Without Moments for Symmetric Distributions [7.105512316884493]
We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. We propose a sequence of efficient algorithms that approaches this optimal error. Our algorithms are based on a generalization of the well-known filtering technique.
arXiv Detail & Related papers (2023-02-21T17:52:23Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.