Optimizing Feature Selection in Causal Inference: A Three-Stage Computational Framework for Unbiased Estimation
- URL: http://arxiv.org/abs/2502.00501v1
- Date: Sat, 01 Feb 2025 17:47:28 GMT
- Title: Optimizing Feature Selection in Causal Inference: A Three-Stage Computational Framework for Unbiased Estimation
- Authors: Tianyu Yang, Md. Noor-E-Alam,
- Abstract summary: We propose an enhanced three-stage framework that shows a significant improvement in selecting the desired subset of variables.
We evaluated our proposed framework using a state-of-the-art synthetic data across various settings and observed superior performance within a feasible time.
- Score: 13.198582304877513
- License:
- Abstract: Feature selection is an important but challenging task in causal inference for obtaining unbiased estimates of causal quantities. Properly selected features in causal inference not only significantly reduce the time required to implement a matching algorithm but, more importantly, can also reduce the bias and variance when estimating causal quantities. When feature selection techniques are applied in causal inference, the crucial criterion is to select variables that, when used for matching, can achieve an unbiased and robust estimation of causal quantities. Recent research suggests that balancing only on treatment-associated variables introduces bias while balancing on spurious variables increases variance. To address this issue, we propose an enhanced three-stage framework that shows a significant improvement in selecting the desired subset of variables compared to the existing state-of-the-art feature selection framework for causal inference, resulting in lower bias and variance in estimating the causal quantity. We evaluated our proposed framework using a state-of-the-art synthetic data across various settings and observed superior performance within a feasible computation time, ensuring scalability for large-scale datasets. Finally, to demonstrate the applicability of our proposed methodology using large-scale real-world data, we evaluated an important US healthcare policy related to the opioid epidemic crisis: whether opioid use disorder has a causal relationship with suicidal behavior.
Related papers
- Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables [15.105594376616253]
Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science.
We propose a novel local learning approach for covariate selection in nonparametric causal effect estimation.
We validate our algorithm through extensive experiments on both synthetic and real-world data.
arXiv Detail & Related papers (2024-11-25T12:08:54Z) - Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables.
We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure.
We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z) - Interval Estimation of Coefficients in Penalized Regression Models of Insurance Data [3.5637073151604093]
Tweedie exponential dispersion family is a popular choice among many to model insurance losses.
It is often important to obtain credibility (inference) of the most important features that describe the endogenous variables.
arXiv Detail & Related papers (2024-10-01T18:57:18Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Active Learning for Optimal Intervention Design in Causal Models [11.294389953686945]
We develop a causal active learning strategy to identify interventions that are optimal, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean.
We apply our approach to both synthetic data and single-cell transcriptomic data from Perturb-CITE-seq experiments to identify optimal perturbations that induce a specific cell state transition.
arXiv Detail & Related papers (2022-09-10T20:40:30Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - A Two-Stage Feature Selection Approach for Robust Evaluation of
Treatment Effects in High-Dimensional Observational Data [1.4710887888397084]
We propose a novel two-stage feature selection technique called, Outcome Adaptive Elastic Net (OAENet)
OAENet is explicitly designed for making robust causal inference decisions using matching techniques.
Numerical experiments on simulated data demonstrate that OAENet significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T02:54:30Z) - BayesIMP: Uncertainty Quantification for Causal Data Fusion [52.184885680729224]
We study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable.
We introduce a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space.
arXiv Detail & Related papers (2021-06-07T10:14:18Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - MissDeepCausal: Causal Inference from Incomplete Data Using Deep Latent
Variable Models [14.173184309520453]
State-of-the-art methods for causal inference don't consider missing values.
Missing data require an adapted unconfoundedness hypothesis.
Latent confounders whose distribution is learned through variational autoencoders adapted to missing values are considered.
arXiv Detail & Related papers (2020-02-25T12:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.