Copula Entropy based Variable Selection for Survival Analysis
- URL: http://arxiv.org/abs/2209.01561v1
- Date: Sun, 4 Sep 2022 08:14:07 GMT
- Title: Copula Entropy based Variable Selection for Survival Analysis
- Authors: Jian Ma
- Abstract summary: We propose to apply the Copula Entropy (CE)-based method for variable selection to survival analysis.
The idea is to measure the correlation between variables and time-to-event with CE and then select variables according to their CE value.
- Score: 2.3980064191633232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variable selection is an important problem in statistics and machine
learning. Copula Entropy (CE) is a mathematical concept for measuring
statistical independence and has been applied to variable selection recently.
In this paper we propose to apply the CE-based method for variable selection to
survival analysis. The idea is to measure the correlation between variables and
time-to-event with CE and then select variables according to their CE value.
Experiments on simulated data and two real cancer data were conducted to
compare the proposed method with two related methods: random survival forest
and Lasso-Cox. Experimental results showed that the proposed method can select
the 'right' variables out that are more interpretable and lead to better
prediction performance.
Related papers
- Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Scalable variable selection for two-view learning tasks with projection
operators [0.0]
We propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems.
Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions.
arXiv Detail & Related papers (2023-07-04T08:22:05Z) - Inferring independent sets of Gaussian variables after thresholding
correlations [1.3535770763481905]
We consider testing whether a set of Gaussian variables, selected from the data, is independent of the remaining variables.
We develop a new characterization of the conditioning event in terms of the canonical correlation between the groups of random variables.
In simulation studies and in the analysis of gene co-expression networks, we show that our approach has much higher power than a naive'' approach that ignores the effect of selection.
arXiv Detail & Related papers (2022-11-02T23:47:32Z) - Variational Bayes for high-dimensional proportional hazards models with
applications to gene expression variable selection [3.8761064607384195]
We propose a variational Bayesian proportional hazards model for prediction and variable selection regarding high-dimensional survival data.
Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC.
We demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes.
arXiv Detail & Related papers (2021-12-19T22:10:41Z) - Variance Minimization in the Wasserstein Space for Invariant Causal
Prediction [72.13445677280792]
In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors.
Each of these tests relies on the minimization of a novel loss function that is derived from tools in optimal transport theory.
We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
arXiv Detail & Related papers (2021-10-13T22:30:47Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Interpretable random forest models through forward variable selection [0.0]
We develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function.
We demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands.
arXiv Detail & Related papers (2020-05-11T13:56:49Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.