Fisher's combined probability test for high-dimensional covariance
matrices
- URL: http://arxiv.org/abs/2006.00426v1
- Date: Sun, 31 May 2020 03:32:26 GMT
- Title: Fisher's combined probability test for high-dimensional covariance
matrices
- Authors: Xiufan Yu, Danning Li, and Lingzhou Xue
- Abstract summary: We propose a scale-invariant power enhancement test based on Fisher's method to combine the p-values of quadratic form statistics and maximum form statistics.
We prove that the proposed combination method retains the correct size and boosts the power against more general alternatives.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Testing large covariance matrices is of fundamental importance in statistical
analysis with high-dimensional data. In the past decade, three types of test
statistics have been studied in the literature: quadratic form statistics,
maximum form statistics, and their weighted combination. It is known that
quadratic form statistics would suffer from low power against sparse
alternatives and maximum form statistics would suffer from low power against
dense alternatives. The weighted combination methods were introduced to enhance
the power of quadratic form statistics or maximum form statistics when the
weights are appropriately chosen. In this paper, we provide a new perspective
to exploit the full potential of quadratic form statistics and maximum form
statistics for testing high-dimensional covariance matrices. We propose a
scale-invariant power enhancement test based on Fisher's method to combine the
p-values of quadratic form statistics and maximum form statistics. After
carefully studying the asymptotic joint distribution of quadratic form
statistics and maximum form statistics, we prove that the proposed combination
method retains the correct asymptotic size and boosts the power against more
general alternatives. Moreover, we demonstrate the finite-sample performance in
simulation studies and a real application.
Related papers
- Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms [19.616162116973637]
We develop a unified methodology for statistical inference via randomized sketching or projections.
The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm.
arXiv Detail & Related papers (2024-04-01T04:35:44Z) - Toward Generalizable Machine Learning Models in Speech, Language, and
Hearing Sciences: Estimating Sample Size and Reducing Overfitting [1.8416014644193064]
This study uses Monte Carlo simulations to quantify the interactions between the employed cross-validation method and the discnative power of features.
The required sample size with a single holdout could be 50% higher than what would be needed if nested crossvalidation were used.
arXiv Detail & Related papers (2023-08-22T05:14:42Z) - A Statistical View of Column Subset Selection [47.65143789184956]
We consider the problem of selecting a small subset of representative variables from a large dataset.
We show how to efficiently (1) perform CSS using only summary statistics from the original dataset; (2) perform CSS in the presence of missing and/or censored data; and (3) select the subset size for CSS in a hypothesis testing framework.
arXiv Detail & Related papers (2023-07-24T15:42:33Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - A Genetic Algorithm-based Framework for Learning Statistical Power
Manifold [1.7205106391379026]
We propose a novel genetic algorithm-based framework for learning statistical power manifold.
For a multiple linear regression $F$-test, we show that the proposed algorithm learns the statistical power manifold much faster as compared to a brute-force approach.
arXiv Detail & Related papers (2022-09-01T04:15:42Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - The UU-test for Statistical Modeling of Unimodal Data [0.20305676256390928]
We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset.
A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model.
arXiv Detail & Related papers (2020-08-28T08:34:28Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - A Robust Test for Elliptical Symmetry [2.030567625639093]
Ellipticity GoF tests are usually hard to analyze and often their statistical power is not particularly strong.
We develop a novel framework based on the exchangeable random variables calculus introduced by de Finetti.
arXiv Detail & Related papers (2020-06-05T08:51:16Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Marginal likelihood computation for model selection and hypothesis
testing: an extensive review [66.37504201165159]
This article provides a comprehensive study of the state-of-the-art of the topic.
We highlight limitations, benefits, connections and differences among the different techniques.
Problems and possible solutions with the use of improper priors are also described.
arXiv Detail & Related papers (2020-05-17T18:31:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.