Meta Learning for Support Recovery in High-dimensional Precision Matrix
Estimation
- URL: http://arxiv.org/abs/2006.12598v2
- Date: Sun, 21 Feb 2021 17:14:04 GMT
- Title: Meta Learning for Support Recovery in High-dimensional Precision Matrix
Estimation
- Authors: Qian Zhang and Yilin Zheng and Jean Honorio
- Abstract summary: We study meta learning for support (i.e., the set of non-zero entries) recovery in high-dimensional precision matrix estimation.
In our setup, each task has a different random true precision matrix, each with a possibly different support.
We prove a matching information-theoretic lower bound for the necessary number of samples, which is $n in Omega(log N)/K)$, and thus, our algorithm is minimax optimal.
- Score: 31.41834200409171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study meta learning for support (i.e., the set of non-zero
entries) recovery in high-dimensional precision matrix estimation where we
reduce the sufficient sample complexity in a novel task with the information
learned from other auxiliary tasks. In our setup, each task has a different
random true precision matrix, each with a possibly different support. We assume
that the union of the supports of all the true precision matrices (i.e., the
true support union) is small in size. We propose to pool all the samples from
different tasks, and \emph{improperly} estimate a single precision matrix by
minimizing the $\ell_1$-regularized log-determinant Bregman divergence. We show
that with high probability, the support of the \emph{improperly} estimated
single precision matrix is equal to the true support union, provided a
sufficient number of samples per task $n \in O((\log N)/K)$, for
$N$-dimensional vectors and $K$ tasks. That is, one requires less samples per
task when more tasks are available. We prove a matching information-theoretic
lower bound for the necessary number of samples, which is $n \in \Omega((\log
N)/K)$, and thus, our algorithm is minimax optimal. Then for the novel task, we
prove that the minimization of the $\ell_1$-regularized log-determinant Bregman
divergence with the additional constraint that the support is a subset of the
estimated support union could reduce the sufficient sample complexity of
successful support recovery to $O(\log(|S_{\text{off}}|))$ where
$|S_{\text{off}}|$ is the number of off-diagonal elements in the support union
and is much less than $N$ for sparse matrices. We also prove a matching
information-theoretic lower bound of $\Omega(\log(|S_{\text{off}}|))$ for the
necessary number of samples. Synthetic experiments validate our theory.
Related papers
- Simple and Nearly-Optimal Sampling for Rank-1 Tensor Completion via Gauss-Jordan [49.1574468325115]
We revisit the sample and computational complexity of completing a rank-1 tensor in $otimes_i=1N mathbbRd$.
We present a characterization of the problem which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems.
arXiv Detail & Related papers (2024-08-10T04:26:19Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Meta Learning for High-dimensional Ising Model Selection Using
$\ell_1$-regularized Logistic Regression [28.776950569604026]
We consider the meta learning problem for estimating the graphs associated with high-dimensional Ising models.
Our goal is to use the information learned from the auxiliary tasks in the learning of the novel task to reduce its sufficient sample complexity.
arXiv Detail & Related papers (2022-08-19T20:28:39Z) - Meta Sparse Principal Component Analysis [31.403997435274604]
We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis.
We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks.
arXiv Detail & Related papers (2022-08-18T16:28:31Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Multiple Support Recovery Using Very Few Measurements Per Sample [33.20967869233738]
We are given access to linear measurements of multiple sparse samples in $mathbbRd$.
These samples can be partitioned into $ell$ groups, with samples having the same support belonging to the same group.
For a given budget of $m$ measurements per sample, the goal is to recover the $ell$ underlying supports.
arXiv Detail & Related papers (2021-05-20T16:02:27Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - A Randomized Algorithm to Reduce the Support of Discrete Measures [79.55586575988292]
Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms.
We give a simple geometric characterization of barycenters via negative cones and derive a randomized algorithm that computes this new measure by "greedy geometric sampling"
We then study its properties, and benchmark it on synthetic and real-world data to show that it can be very beneficial in the $Ngg n$ regime.
arXiv Detail & Related papers (2020-06-02T16:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.