A Robust Test for Elliptical Symmetry
- URL: http://arxiv.org/abs/2006.03311v4
- Date: Fri, 14 Apr 2023 13:57:17 GMT
- Title: A Robust Test for Elliptical Symmetry
- Authors: Ilya Soloveychik
- Abstract summary: Ellipticity GoF tests are usually hard to analyze and often their statistical power is not particularly strong.
We develop a novel framework based on the exchangeable random variables calculus introduced by de Finetti.
- Score: 2.030567625639093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most signal processing and statistical applications heavily rely on specific
data distribution models. The Gaussian distributions, although being the most
common choice, are inadequate in most real world scenarios as they fail to
account for data coming from heavy-tailed populations or contaminated by
outliers. Such problems call for the use of Robust Statistics. The robust
models and estimators are usually based on elliptical populations, making the
latter ubiquitous in all methods of robust statistics. To determine whether
such tools are applicable in any specific case, goodness-of-fit (GoF) tests are
used to verify the ellipticity hypothesis. Ellipticity GoF tests are usually
hard to analyze and often their statistical power is not particularly strong.
In this work, assuming the true covariance matrix is unknown we design and
rigorously analyze a robust GoF test consistent against all alternatives to
ellipticity on the unit sphere. The proposed test is based on Tyler's estimator
and is formulated in terms of easily computable statistics of the data. For its
rigorous analysis, we develop a novel framework based on the exchangeable
random variables calculus introduced by de Finetti. Our findings are supported
by numerical simulations comparing them to other popular GoF tests and
demonstrating the significantly higher statistical power of the suggested
technique.
Related papers
- Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Precise Error Rates for Computationally Efficient Testing [75.63895690909241]
We revisit the question of simple-versus-simple hypothesis testing with an eye towards computational complexity.
An existing test based on linear spectral statistics achieves the best possible tradeoff curve between type I and type II error rates.
arXiv Detail & Related papers (2023-11-01T04:41:16Z) - Toward Generalizable Machine Learning Models in Speech, Language, and
Hearing Sciences: Estimating Sample Size and Reducing Overfitting [1.8416014644193064]
This study uses Monte Carlo simulations to quantify the interactions between the employed cross-validation method and the discnative power of features.
The required sample size with a single holdout could be 50% higher than what would be needed if nested crossvalidation were used.
arXiv Detail & Related papers (2023-08-22T05:14:42Z) - Composite Goodness-of-fit Tests with Kernels [19.744607024807188]
We propose a kernel-based hypothesis tests for the challenging composite testing problem.
Our tests make use of minimum distance estimators based on the maximum mean discrepancy and the kernel Stein discrepancy.
As our main result, we show that we are able to estimate the parameter and conduct our test on the same data, while maintaining a correct test level.
arXiv Detail & Related papers (2021-11-19T15:25:06Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - Least Squares Estimation Using Sketched Data with Heteroskedastic Errors [0.0]
We show that estimates using data sketched by random projections will behave as if the errors were homoskedastic.
Inference, including first-stage F tests for instrument relevance, can be simpler than the full sample case if the sketching scheme is appropriately chosen.
arXiv Detail & Related papers (2020-07-15T15:58:27Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications.
We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction.
We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.