CARE: Large Precision Matrix Estimation for Compositional Data
- URL: http://arxiv.org/abs/2309.06985v2
- Date: Fri, 22 Mar 2024 06:37:39 GMT
- Title: CARE: Large Precision Matrix Estimation for Compositional Data
- Authors: Shucong Zhang, Huiyuan Wang, Wei Lin,
- Abstract summary: We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart.
By exploiting this connection, we propose a composition regularized estimation (CARE) method for estimating the sparse basis precision matrix.
Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis.
- Score: 9.440956168571617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional compositional data are prevalent in many applications. The simplex constraint poses intrinsic challenges to inferring the conditional dependence relationships among the components forming a composition, as encoded by a large precision matrix. We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart, which is shown to be asymptotically identifiable under suitable sparsity assumptions. By exploiting this connection, we propose a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix. We derive rates of convergence for the estimator and provide theoretical guarantees on support recovery and data-driven parameter tuning. Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis. In particular, in sufficiently high dimensions, the CARE estimator achieves minimax optimality and performs as well as if the basis were observed. We further discuss how our framework can be extended to handle data containing zeros, including sampling zeros and structural zeros. The advantages of CARE over existing methods are illustrated by simulation studies and an application to inferring microbial ecological networks in the human gut.
Related papers
- Nonparametric Linear Discriminant Analysis for High Dimensional Matrix-Valued Data [0.0]
We propose a novel extension of Fisher's Linear Discriminant Analysis (LDA) tailored for matrix-valued observations.<n>We adopt a nonparametric empirical Bayes framework based on Non Maximum Likelihood Estimation (NPMLE)<n>Our method is effectively generalized to the matrix setting, thereby improving classification performance.
arXiv Detail & Related papers (2025-07-25T07:30:24Z) - Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives [14.726751780239907]
We propose an estimator to extract compact sets of decision rules from tree ensembles.<n>A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule.<n>We demonstrate that our estimator outperforms existing algorithms for rule extraction.
arXiv Detail & Related papers (2025-06-25T04:06:37Z) - Probabilistic Iterative Hard Thresholding for Sparse Learning [2.5782973781085383]
We present an approach towards solving expectation objective optimization problems with cardinality constraints.
We prove convergence of the underlying process, and demonstrate the performance on two Machine Learning problems.
arXiv Detail & Related papers (2024-09-02T18:14:45Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach [27.301741710016223]
We consider inferential questions associated with the missing data version of panel data induced by staggered adoption.
We develop and analyze a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage.
We prove non-asymptotic and high-probability bounds on its error in estimating each missing entry.
arXiv Detail & Related papers (2024-01-24T18:58:18Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Accelerated structured matrix factorization [0.0]
Matrix factorization exploits the idea that, in complex high-dimensional data, the actual signal typically lies in lower-dimensional structures.
By exploiting Bayesian shrinkage priors, we devise a computationally convenient approach for high-dimensional matrix factorization.
The dependence between row and column entities is modeled by inducing flexible sparse patterns within factors.
arXiv Detail & Related papers (2022-12-13T11:35:01Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Rigid and Articulated Point Registration with Expectation Conditional
Maximization [20.096170794358315]
We introduce an innovative EM-like algorithm, namely the Conditional Expectation Maximization for Point Registration (ECMPR) algorithm.
We analyse in detail the associated consequences in terms of estimation of the registration parameters.
We extend rigid registration to articulated registration.
arXiv Detail & Related papers (2020-12-09T17:36:11Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Nonconvex Matrix Completion with Linearly Parameterized Factors [10.163102766021373]
Parametric Factorization holds for important examples including subspace and completion simulations.
The effectiveness of our unified nonconstrained matrix optimization method is also illustrated.
arXiv Detail & Related papers (2020-03-29T22:40:47Z) - Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel
Data [4.550919471480445]
We develop a data-driven smoothing technique for high-dimensional and non-linear panel data models.
The weights are determined by a data-driven way and depend on the similarity between the corresponding functions.
We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator.
arXiv Detail & Related papers (2019-12-30T09:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.