Effective Data-aware Covariance Estimator from Compressed Data
- URL: http://arxiv.org/abs/2010.04966v1
- Date: Sat, 10 Oct 2020 10:10:28 GMT
- Title: Effective Data-aware Covariance Estimator from Compressed Data
- Authors: Xixian Chen, Haiqin Yang, Shenglin Zhao, Michael R. Lyu, and Irwin
King
- Abstract summary: We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
- Score: 63.16042585506435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating covariance matrix from massive high-dimensional and distributed
data is significant for various real-world applications. In this paper, we
propose a data-aware weighted sampling based covariance matrix estimator,
namely DACE, which can provide an unbiased covariance matrix estimation and
attain more accurate estimation under the same compression ratio. Moreover, we
extend our proposed DACE to tackle multiclass classification problems with
theoretical justification and conduct extensive experiments on both synthetic
and real-world datasets to demonstrate the superior performance of our DACE.
Related papers
- Efficient adjustment for complex covariates: Gaining efficiency with
DOPE [56.537164957672715]
We propose a framework that accommodates adjustment for any subset of information expressed by the covariates.
Based on our theoretical results, we propose the Debiased Outcome-adapted Propensity Estorimator (DOPE) for efficient estimation of the average treatment effect (ATE)
Our results show that the DOPE provides an efficient and robust methodology for ATE estimation in various observational settings.
arXiv Detail & Related papers (2024-02-20T13:02:51Z) - Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Regularized Linear Discriminant Analysis Using a Nonlinear Covariance
Matrix Estimator [11.887333567383239]
Linear discriminant analysis (LDA) is a widely used technique for data classification.
LDA becomes inefficient when the data covariance matrix is ill-conditioned.
Regularized LDA methods have been proposed to cope with such a situation.
arXiv Detail & Related papers (2024-01-31T11:37:14Z) - Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry [0.0]
We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold.
We develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget.
Evaluations of our approach using data from physical applications demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
arXiv Detail & Related papers (2023-01-31T16:33:46Z) - Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions.
We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels.
The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - CoinPress: Practical Private Mean and Covariance Estimation [18.6419638570742]
We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data.
We show that their error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods.
arXiv Detail & Related papers (2020-06-11T17:17:28Z) - Two-Dimensional Semi-Nonnegative Matrix Factorization for Clustering [50.43424130281065]
We propose a new Semi-Nonnegative Matrix Factorization method for 2-dimensional (2D) data, named TS-NMF.
It overcomes the drawback of existing methods that seriously damage the spatial information of the data by converting 2D data to vectors in a preprocessing step.
arXiv Detail & Related papers (2020-05-19T05:54:14Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Covariance Estimation for Matrix-valued Data [9.739753590548796]
We propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data.
We formulate a unified framework for estimating bandable covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation.
We demonstrate the superior finite-sample performance of our methods using simulations and real applications from a gridded temperature anomalies dataset and a S&P 500 stock data analysis.
arXiv Detail & Related papers (2020-04-11T02:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.