Meta Sparse Principal Component Analysis
- URL: http://arxiv.org/abs/2208.08938v2
- Date: Fri, 19 Aug 2022 10:28:15 GMT
- Title: Meta Sparse Principal Component Analysis
- Authors: Imon Banerjee and Jean Honorio
- Abstract summary: We study the meta-learning for support (i.e. the set of non-zero entries) recovery in high-dimensional Principal Component Analysis.
We reduce the sufficient sample complexity in a novel task with the information that is learned from auxiliary tasks.
- Score: 31.403997435274604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the meta-learning for support (i.e. the set of non-zero entries)
recovery in high-dimensional Principal Component Analysis. We reduce the
sufficient sample complexity in a novel task with the information that is
learned from auxiliary tasks. We assume each task to be a different random
Principal Component (PC) matrix with a possibly different support and that the
support union of the PC matrices is small. We then pool the data from all the
tasks to execute an improper estimation of a single PC matrix by maximising the
$l_1$-regularised predictive covariance to establish that with high probability
the true support union can be recovered provided a sufficient number of tasks
$m$ and a sufficient number of samples $ O\left(\frac{\log(p)}{m}\right)$ for
each task, for $p$-dimensional vectors. Then, for a novel task, we prove that
the maximisation of the $l_1$-regularised predictive covariance with the
additional constraint that the support is a subset of the estimated support
union could reduce the sufficient sample complexity of successful support
recovery to $O(\log |J|)$, where $J$ is the support union recovered from the
auxiliary tasks. Typically, $|J|$ would be much less than $p$ for sparse
matrices. Finally, we demonstrate the validity of our experiments through
numerical simulations.
Related papers
- Optimal level set estimation for non-parametric tournament and crowdsourcing problems [49.75262185577198]
Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions.
In this paper, we assume that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns.
We construct an efficient-time algorithm that turns out to be minimax optimal for this classification problem.
arXiv Detail & Related papers (2024-08-27T18:28:31Z) - Meta Learning for High-dimensional Ising Model Selection Using
$\ell_1$-regularized Logistic Regression [28.776950569604026]
We consider the meta learning problem for estimating the graphs associated with high-dimensional Ising models.
Our goal is to use the information learned from the auxiliary tasks in the learning of the novel task to reduce its sufficient sample complexity.
arXiv Detail & Related papers (2022-08-19T20:28:39Z) - On the Power of Multitask Representation Learning in Linear MDP [61.58929164172968]
This paper presents analyses for the statistical benefit of multitask representation learning in linear Markov Decision Process (MDP)
We first discover a emphLeast-Activated-Feature-Abundance (LAFA) criterion, denoted as $kappa$, with which we prove that a straightforward least-square algorithm learns a policy which is $tildeO(H2sqrtfrackappa mathcalC(Phi)2 kappa dNT+frackappa dn)
arXiv Detail & Related papers (2021-06-15T11:21:06Z) - Multiple Support Recovery Using Very Few Measurements Per Sample [33.20967869233738]
We are given access to linear measurements of multiple sparse samples in $mathbbRd$.
These samples can be partitioned into $ell$ groups, with samples having the same support belonging to the same group.
For a given budget of $m$ measurements per sample, the goal is to recover the $ell$ underlying supports.
arXiv Detail & Related papers (2021-05-20T16:02:27Z) - Sample Efficient Linear Meta-Learning by Alternating Minimization [74.40553081646995]
We study a simple alternating minimization method (MLLAM) which alternately learns the low-dimensional subspace and the regressors.
We show that for a constant subspace dimension MLLAM obtains nearly-optimal estimation error, despite requiring only $Omega(log d)$ samples per task.
We propose a novel task subset selection scheme that ensures the same strong statistical guarantee as MLLAM.
arXiv Detail & Related papers (2021-05-18T06:46:48Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Sampling from a $k$-DPP without looking at all items [58.30573872035083]
Given a kernel function and a subset size $k$, our goal is to sample $k$ out of $n$ items with probability proportional to the determinant of the kernel matrix induced by the subset (a.k.a. $k$-DPP)
Existing $k$-DPP sampling algorithms require an expensive preprocessing step which involves multiple passes over all $n$ items, making it infeasible for large datasets.
We develop an algorithm which adaptively builds a sufficiently large uniform sample of data that is then used to efficiently generate a smaller set of $k$ items.
arXiv Detail & Related papers (2020-06-30T16:40:44Z) - Meta Learning for Support Recovery in High-dimensional Precision Matrix
Estimation [31.41834200409171]
We study meta learning for support (i.e., the set of non-zero entries) recovery in high-dimensional precision matrix estimation.
In our setup, each task has a different random true precision matrix, each with a possibly different support.
We prove a matching information-theoretic lower bound for the necessary number of samples, which is $n in Omega(log N)/K)$, and thus, our algorithm is minimax optimal.
arXiv Detail & Related papers (2020-06-22T20:24:52Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.