Estimation of Shortest Path Covariance Matrices
- URL: http://arxiv.org/abs/2011.09986v1
- Date: Thu, 19 Nov 2020 17:37:46 GMT
- Title: Estimation of Shortest Path Covariance Matrices
- Authors: Raj Kumar Maity and Cameron Musco
- Abstract summary: We study the sample complexity of estimating the covariance matrix $mathbfSigma in mathbbRdtimes d$ of a distribution $mathcal D$ over $mathbbRd$ given independent samples.
We give a very simple algorithm for estimating $mathbfSigma$ up to norm error $epsilon left|mathbfSigmaright|$ using just $O(sqrtD)$ entry complexity and $tilde O(r
- Score: 21.772761365915176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the sample complexity of estimating the covariance matrix
$\mathbf{\Sigma} \in \mathbb{R}^{d\times d}$ of a distribution $\mathcal D$
over $\mathbb{R}^d$ given independent samples, under the assumption that
$\mathbf{\Sigma}$ is graph-structured. In particular, we focus on shortest path
covariance matrices, where the covariance between any two measurements is
determined by the shortest path distance in an underlying graph with $d$ nodes.
Such matrices generalize Toeplitz and circulant covariance matrices and are
widely applied in signal processing applications, where the covariance between
two measurements depends on the (shortest path) distance between them in time
or space.
We focus on minimizing both the vector sample complexity: the number of
samples drawn from $\mathcal{D}$ and the entry sample complexity: the number of
entries read in each sample. The entry sample complexity corresponds to
measurement equipment costs in signal processing applications. We give a very
simple algorithm for estimating $\mathbf{\Sigma}$ up to spectral norm error
$\epsilon \left\|\mathbf{\Sigma}\right\|_2$ using just $O(\sqrt{D})$ entry
sample complexity and $\tilde O(r^2/\epsilon^2)$ vector sample complexity,
where $D$ is the diameter of the underlying graph and $r \le d$ is the rank of
$\mathbf{\Sigma}$. Our method is based on extending the widely applied idea of
sparse rulers for Toeplitz covariance estimation to the graph setting.
In the special case when $\mathbf{\Sigma}$ is a low-rank Toeplitz matrix, our
result matches the state-of-the-art, with a far simpler proof. We also give an
information theoretic lower bound matching our upper bound up to a factor $D$
and discuss some directions towards closing this gap.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Testing Closeness of Multivariate Distributions via Ramsey Theory [40.926523210945064]
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions.
Specifically, given sample access to two unknown distributions $mathbf p, mathbf q$ on $mathbb Rd$, we want to distinguish between the case that $mathbf p=mathbf q$ versus $|mathbf p-mathbf q|_A_k > epsilon$.
Our main result is the first closeness tester for this problem with em sub-learning sample complexity in any fixed dimension.
arXiv Detail & Related papers (2023-11-22T04:34:09Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - Perfect Sampling from Pairwise Comparisons [26.396901523831534]
We study how to efficiently obtain perfect samples from a discrete distribution $mathcalD$ given access only to pairwise comparisons of elements of its support.
We design a Markov chain whose stationary distribution coincides with $mathcalD$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past.
arXiv Detail & Related papers (2022-11-23T11:20:30Z) - Sample Complexity Bounds for Learning High-dimensional Simplices in
Noisy Regimes [5.526935605535376]
We find a sample complexity bound for learning a simplex from noisy samples.
We show that as long as $mathrmSNRgeOmegaleft(K1/2right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case.
arXiv Detail & Related papers (2022-09-09T23:35:25Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Multi-reference alignment in high dimensions: sample complexity and
phase transition [31.841289319809814]
Multi-reference alignment entails estimating a signal in $mathbbRL$ from its circularly-shifted and noisy copies.
Motivated by single-particle cryo-electron microscopy, we analyze the sample complexity of the problem in the high-dimensional regime.
Our analysis uncovers a phase transition phenomenon governed by the parameter $alpha = L/(sigma2log L)$, where $sigma2$ is the variance of the noise.
arXiv Detail & Related papers (2020-07-22T15:04:47Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.