Scalable Bigraphical Lasso: Two-way Sparse Network Inference for Count
Data
- URL: http://arxiv.org/abs/2203.07912v1
- Date: Tue, 15 Mar 2022 13:50:49 GMT
- Title: Scalable Bigraphical Lasso: Two-way Sparse Network Inference for Count
Data
- Authors: Sijia Li, Mart\'in L\'opez-Garc\'ia, Neil D. Lawrence, Luisa Cutillo
- Abstract summary: We exploit eigenvalue decomposition of the Cartesian product graph to present a more efficient version of the Bigraphical Lasso algorithm.
Our methodology accounts for the dependencies across both instances and features, reduces the computational complexity for high dimensional data.
- Score: 11.762284639312613
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Classically, statistical datasets have a larger number of data points than
features ($n > p$). The standard model of classical statistics caters for the
case where data points are considered conditionally independent given the
parameters. However, for $n\approx p$ or $p > n$ such models are poorly
determined. Kalaitzis et al. (2013) introduced the Bigraphical Lasso, an
estimator for sparse precision matrices based on the Cartesian product of
graphs. Unfortunately, the original Bigraphical Lasso algorithm is not
applicable in case of large p and n due to memory requirements. We exploit
eigenvalue decomposition of the Cartesian product graph to present a more
efficient version of the algorithm which reduces memory requirements from
$O(n^2p^2)$ to $O(n^2 + p^2)$. Many datasets in different application fields,
such as biology, medicine and social science, come with count data, for which
Gaussian based models are not applicable. Our multi-way network inference
approach can be used for discrete data. Our methodology accounts for the
dependencies across both instances and features, reduces the computational
complexity for high dimensional data and enables to deal with both discrete and
continuous data. Numerical studies on both synthetic and real datasets are
presented to showcase the performance of our method.
Related papers
- Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Samples and Features [0.30723404270319693]
We introduce a method that has $O(n2)$ runtime and $O(n)$ space complexity, without assuming independence.
We demonstrate that our approach can be used on unprecedentedly large datasets, such as a real-world 1,000,000-cell scRNA-seq dataset.
arXiv Detail & Related papers (2024-07-29T11:15:25Z) - Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic [99.3682210827572]
Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets.
Data curation strategies are typically developed agnostic of the available compute for training.
We introduce neural scaling laws that account for the non-homogeneous nature of web data.
arXiv Detail & Related papers (2024-04-10T17:27:54Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Compressive Recovery of Sparse Precision Matrices [5.557600489035657]
We consider the problem of learning a graph modeling the statistical relations of the $d$ variables from a dataset with $n$ samples $X in mathbbRn times d$.
We show that it is possible to estimate it from a sketch of size $m=Omegaleft((d+2k)log(d)right)$ where $k$ is the maximal number of edges of the underlying graph.
We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser.
arXiv Detail & Related papers (2023-11-08T13:29:08Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.