Selecting Robust Features for Machine Learning Applications using
Multidata Causal Discovery
- URL: http://arxiv.org/abs/2304.05294v5
- Date: Fri, 30 Jun 2023 14:14:23 GMT
- Title: Selecting Robust Features for Machine Learning Applications using
Multidata Causal Discovery
- Authors: Saranya Ganesh S., Tom Beucler, Frederick Iat-Hin Tam, Milton S.
Gomez, Jakob Runge, and Andreas Gerhardus
- Abstract summary: We introduce a Multidata causal feature selection approach that simultaneously processes an ensemble of time series datasets.
This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package.
We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones.
- Score: 7.8814500102882805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust feature selection is vital for creating reliable and interpretable
Machine Learning (ML) models. When designing statistical prediction models in
cases where domain knowledge is limited and underlying interactions are
unknown, choosing the optimal set of features is often difficult. To mitigate
this issue, we introduce a Multidata (M) causal feature selection approach that
simultaneously processes an ensemble of time series datasets and produces a
single set of causal drivers. This approach uses the causal discovery
algorithms PC1 or PCMCI that are implemented in the Tigramite Python package.
These algorithms utilize conditional independence tests to infer parts of the
causal graph. Our causal feature selection approach filters out
causally-spurious links before passing the remaining causal features as inputs
to ML models (Multiple linear regression, Random Forest) that predict the
targets. We apply our framework to the statistical intensity prediction of
Western Pacific Tropical Cyclones (TC), for which it is often difficult to
accurately choose drivers and their dimensionality reduction (time lags,
vertical levels, and area-averaging). Using more stringent significance
thresholds in the conditional independence tests helps eliminate spurious
causal relationships, thus helping the ML model generalize better to unseen TC
cases. M-PC1 with a reduced number of features outperforms M-PCMCI, non-causal
ML, and other feature selection methods (lagged correlation, random), even
slightly outperforming feature selection based on eXplainable Artificial
Intelligence. The optimal causal drivers obtained from our causal feature
selection help improve our understanding of underlying relationships and
suggest new potential drivers of TC intensification.
Related papers
- Towards Robust Text Classification: Mitigating Spurious Correlations with Causal Learning [2.7813683000222653]
We propose the Causally Calibrated Robust ( CCR) to reduce models' reliance on spurious correlations.
CCR integrates a causal feature selection method based on counterfactual reasoning, along with an inverse propensity weighting (IPW) loss function.
We show that CCR state-of-the-art performance among methods without group labels, and in some cases, it can compete with the models that utilize group labels.
arXiv Detail & Related papers (2024-11-01T21:29:07Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Confidence-Based Model Selection: When to Take Shortcuts for
Subpopulation Shifts [119.22672589020394]
We propose COnfidence-baSed MOdel Selection (CosMoS), where model confidence can effectively guide model selection.
We evaluate CosMoS on four datasets with spurious correlations, each with multiple test sets with varying levels of data distribution shift.
arXiv Detail & Related papers (2023-06-19T18:48:15Z) - Flexible variable selection in the presence of missing data [0.0]
We propose a non-parametric variable selection algorithm combined with multiple imputation to develop flexible panels in the presence of missing-at-random data.
We show that our proposal has good operating characteristics and results in panels with higher classification and variable selection performance.
arXiv Detail & Related papers (2022-02-25T21:41:03Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Improving Sample and Feature Selection with Principal Covariates
Regression [0.0]
We focus on two popular sub-selection schemes which have been applied to this end.
We show that incorporating target information provides selections that perform better in supervised tasks.
We also show that incorporating aspects of simple supervised learning models can improve the accuracy of more complex models.
arXiv Detail & Related papers (2020-12-22T18:52:06Z) - Feature Selection for Huge Data via Minipatch Learning [0.0]
We propose Stable Minipatch Selection (STAMPS) and Adaptive STAMPS.
STAMPS are meta-algorithms that build ensembles of selection events of base feature selectors trained on tiny, (ly-adaptive) random subsets of both the observations and features of the data.
Our approaches are general and can be employed with a variety of existing feature selection strategies and machine learning techniques.
arXiv Detail & Related papers (2020-10-16T17:41:08Z) - Stepwise Model Selection for Sequence Prediction via Deep Kernel
Learning [100.83444258562263]
We propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting.
In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions.
We are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose.
arXiv Detail & Related papers (2020-01-12T09:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.