Predictive Data Calibration for Linear Correlation Significance Testing
- URL: http://arxiv.org/abs/2208.07081v1
- Date: Mon, 15 Aug 2022 09:19:06 GMT
- Title: Predictive Data Calibration for Linear Correlation Significance Testing
- Authors: Kaustubh R. Patil, Simon B. Eickhoff, Robert Langner
- Abstract summary: Pearson's correlation coefficient (PCC) is known to lack in both regards.
We propose a machine-learning-based predictive data calibration method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inferring linear relationships lies at the heart of many empirical
investigations. A measure of linear dependence should correctly evaluate the
strength of the relationship as well as qualify whether it is meaningful for
the population. Pearson's correlation coefficient (PCC), the \textit{de-facto}
measure for bivariate relationships, is known to lack in both regards. The
estimated strength $r$ maybe wrong due to limited sample size, and nonnormality
of data. In the context of statistical significance testing, erroneous
interpretation of a $p$-value as posterior probability leads to Type I errors
-- a general issue with significance testing that extends to PCC. Such errors
are exacerbated when testing multiple hypotheses simultaneously. To tackle
these issues, we propose a machine-learning-based predictive data calibration
method which essentially conditions the data samples on the expected linear
relationship. Calculating PCC using calibrated data yields a calibrated
$p$-value that can be interpreted as posterior probability together with a
calibrated $r$ estimate, a desired outcome not provided by other methods.
Furthermore, the ensuing independent interpretation of each test might
eliminate the need for multiple testing correction. We provide empirical
evidence favouring the proposed method using several simulations and
application to real-world data.
Related papers
- Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Discriminative calibration: Check Bayesian computation from simulations
and flexible classifier [23.91355980551754]
We propose to replace the marginal rank test with a flexible classification approach that learns test statistics from data.
We illustrate an automated implementation using neural networks and statistically-inspired features, and validate the method with numerical and real data experiments.
arXiv Detail & Related papers (2023-05-24T00:18:48Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Causal Inference with Corrupted Data: Measurement Error, Missing Values,
Discretization, and Differential Privacy [6.944765747195337]
We formulate a semiparametric model of causal inference with high dimensional corrupted data.
We prove consistency and Gaussian approximation by finite sample arguments.
Our analysis provides nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics.
arXiv Detail & Related papers (2021-07-06T17:42:49Z) - Testing for Outliers with Conformal p-values [14.158078752410182]
The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers.
We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points.
We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense.
arXiv Detail & Related papers (2021-04-16T17:59:21Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z) - Stable Prediction with Model Misspecification and Agnostic Distribution
Shift [41.26323389341987]
In machine learning algorithms, two main assumptions are required to guarantee performance.
One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly specified.
Under model misspecification, distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data.
We propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimize a variable decorrelation regularizer and a weighted regression model.
arXiv Detail & Related papers (2020-01-31T08:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.