Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect
- URL: http://arxiv.org/abs/2501.16988v2
- Date: Wed, 29 Jan 2025 02:03:17 GMT
- Title: Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect
- Authors: Mohammad Kaviul Anam Khan, Olli Saarela, Rafal Kustra,
- Abstract summary: "Marginal Variable Importance Metric" (MVIM) is a model-agnostic measure of importance based on the true conditional expectation function.
MVIM is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions.
A Conditional Variable Importance Metric (CVIM) is introduced to reduce this bias.
- Score: 0.18906710320196732
- License:
- Abstract: Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by \citet{breiman2001random} and \citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from \citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).
Related papers
- Model aggregation: minimizing empirical variance outperforms minimizing
empirical error [0.29008108937701327]
We propose a data-driven framework that aggregates predictions from diverse models into a single, more accurate output.
It is non-intrusive - treating models as black-box functions - model-agnostic, requires minimal assumptions, and can combine outputs from a wide range of models.
We show how it successfully integrates traditional solvers with machine learning models to improve both robustness and accuracy.
arXiv Detail & Related papers (2024-09-25T18:33:21Z) - Exploiting Observation Bias to Improve Matrix Completion [15.171759590760574]
We propose a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent (or unobserved) factors.
We devise Mask Nearest Neighbor (MNN), a novel two-stage matrix completion algorithm.
Our analysis reveals that MNN enjoys entry-wise finite-sample error rates that are competitive with corresponding supervised learning parametric rates.
arXiv Detail & Related papers (2023-06-07T20:48:35Z) - Biases in Inverse Ising Estimates of Near-Critical Behaviour [0.0]
Inverse inference allows pairwise interactions to be reconstructed from empirical correlations.
We show that estimators used for this inference, such as Pseudo-likelihood (PLM), are biased.
Data-driven methods are explored and applied to a functional magnetic resonance imaging (fMRI) dataset from neuroscience.
arXiv Detail & Related papers (2023-01-13T14:01:43Z) - A Generalized Variable Importance Metric and Estimator for Black Box
Machine Learning Models [0.21249247666376617]
We define a population parameter, Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods.
We extend previously published results to show that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for any kind of a predictor.
arXiv Detail & Related papers (2022-12-20T00:50:28Z) - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian
Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension.
We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Beyond Marginal Uncertainty: How Accurately can Bayesian Regression
Models Estimate Posterior Predictive Correlations? [13.127549105535623]
It is often more useful to estimate predictive correlations between the function values at different input locations.
We first consider a downstream task which depends on posterior predictive correlations: transductive active learning (TAL)
Since TAL is too expensive and indirect to guide development of algorithms, we introduce two metrics which more directly evaluate the predictive correlations.
arXiv Detail & Related papers (2020-11-06T03:48:59Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.