Inferring independent sets of Gaussian variables after thresholding
correlations
- URL: http://arxiv.org/abs/2211.01521v1
- Date: Wed, 2 Nov 2022 23:47:32 GMT
- Title: Inferring independent sets of Gaussian variables after thresholding
correlations
- Authors: Arkajyoti Saha, Daniela Witten, Jacob Bien
- Abstract summary: We consider testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables.
We develop a new characterization of the conditioning event in terms of the canonical correlation between the groups of random variables.
In simulation studies and in the analysis of gene co-expression networks, we show that our approach has much higher power than a naive'' approach that ignores the effect of selection.
- Score: 1.3535770763481905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider testing whether a set of Gaussian variables, selected from the
data, is independent of the remaining variables. We assume that this set is
selected via a very simple approach that is commonly used across scientific
disciplines: we select a set of variables for which the correlation with all
variables outside the set falls below some threshold. Unlike other settings in
selective inference, failure to account for the selection step leads, in this
setting, to excessively conservative (as opposed to anti-conservative) results.
Our proposed test properly accounts for the fact that the set of variables is
selected from the data, and thus is not overly conservative. To develop our
test, we condition on the event that the selection resulted in the set of
variables in question. To achieve computational tractability, we develop a new
characterization of the conditioning event in terms of the canonical
correlation between the groups of random variables. In simulation studies and
in the analysis of gene co-expression networks, we show that our approach has
much higher power than a ``naive'' approach that ignores the effect of
selection.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Scalable variable selection for two-view learning tasks with projection
operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems.
Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z) - Copula Entropy based Variable Selection for Survival Analysis [2.3980064191633232]
We propose to apply the Copula Entropy (CE)-based method for variable selection to survival analysis.
The idea is to measure the correlation between variables and time-to-event with CE and then select variables according to their CE value.
arXiv Detail & Related papers (2022-09-04T08:14:07Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - Conditional Variable Selection for Intelligent Test [5.904240881373805]
We discuss a novel conditional variable selection framework that can select the most important candidate variables given a set of preselected variables.
In this paper, we discuss a novel conditional variable selection framework that can select the most important candidate variables given a set of preselected variables.
arXiv Detail & Related papers (2022-07-01T11:01:53Z) - Nonparametric Conditional Local Independence Testing [69.31200003384122]
Conditional local independence is an independence relation among continuous time processes.
No nonparametric test of conditional local independence has been available.
We propose such a nonparametric test based on double machine learning.
arXiv Detail & Related papers (2022-03-25T10:31:02Z) - Fast Bayesian Variable Selection in Binomial and Negative Binomial
Regression [9.774282306558465]
We introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression, that exploits logistic regression as a special case.
In experiments we demonstrate the effectiveness of our approach, including on data with seventeen thousand covariates.
arXiv Detail & Related papers (2021-06-28T20:54:41Z) - Safe Tests and Always-Valid Confidence Intervals for contingency tables
and beyond [69.25055322530058]
We develop E variables for testing whether two data streams come from the same source or not.
These E variables lead to tests that remain safe, under flexible sampling scenarios such as optional stopping and continuation.
arXiv Detail & Related papers (2021-06-04T20:12:13Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - On conditional versus marginal bias in multi-armed bandits [105.07190334523304]
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis.
We characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean.
Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy.
arXiv Detail & Related papers (2020-02-19T20:16:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.