Data-driven advice for interpreting local and global model predictions
in bioinformatics problems
- URL: http://arxiv.org/abs/2108.06201v1
- Date: Fri, 13 Aug 2021 12:41:39 GMT
- Title: Data-driven advice for interpreting local and global model predictions
in bioinformatics problems
- Authors: Markus Loecher and Qi Wu
- Abstract summary: Conditional feature contributions (CFCs) provide textitlocal, case-by-case explanations of a prediction.
We compare the explanations computed by both methods on a set of 164 publicly available classification problems.
For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores.
- Score: 17.685881417954782
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tree-based algorithms such as random forests and gradient boosted trees
continue to be among the most popular and powerful machine learning models used
across multiple disciplines. The conventional wisdom of estimating the impact
of a feature in tree based models is to measure the \textit{node-wise reduction
of a loss function}, which (i) yields only global importance measures and (ii)
is known to suffer from severe biases. Conditional feature contributions (CFCs)
provide \textit{local}, case-by-case explanations of a prediction by following
the decision path and attributing changes in the expected output of the model
to each feature along the path. However, Lundberg et al. pointed out a
potential bias of CFCs which depends on the distance from the root of a tree.
The by now immensely popular alternative, SHapley Additive exPlanation (SHAP)
values appear to mitigate this bias but are computationally much more
expensive. Here we contribute a thorough comparison of the explanations
computed by both methods on a set of 164 publicly available classification
problems in order to provide data-driven algorithm recommendations to current
researchers. For random forests, we find extremely high similarities and
correlations of both local and global SHAP values and CFC scores, leading to
very similar rankings and interpretations. Analogous conclusions hold for the
fidelity of using global feature importance scores as a proxy for the
predictive power associated with each feature.
Related papers
- Graph Out-of-Distribution Generalization via Causal Intervention [69.70137479660113]
We introduce a conceptually simple yet principled approach for training robust graph neural networks (GNNs) under node-level distribution shifts.
Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor.
Our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks.
arXiv Detail & Related papers (2024-02-18T07:49:22Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality
Based on Decision Trees as Data Observation Processes [1.2774526936067927]
This paper uses trees to represent data observation processes behind given data.
We derive the statistically optimal prediction, which is robust against overfitting.
We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.
arXiv Detail & Related papers (2023-06-12T12:14:57Z) - SETAR-Tree: A Novel and Accurate Tree Algorithm for Global Time Series
Forecasting [7.206754802573034]
In this paper, we explore the close connections between TAR models and regression trees.
We introduce a new forecasting-specific tree algorithm that trains global Pooled Regression (PR) models in the leaves.
In our evaluation, the proposed tree and forest models are able to achieve significantly higher accuracy than a set of state-of-the-art tree-based algorithms.
arXiv Detail & Related papers (2022-11-16T04:30:42Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - A cautionary tale on fitting decision trees to data from additive
models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models.
This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data.
We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Precision-Recall Curve (PRC) Classification Trees [5.503321733964237]
We propose a novel tree-based algorithm based on the area under the precision-recall curve (AUPRC) for variable selection in the classification context.
Our algorithm, named as the "Precision-Recall Curve classification tree", or simply the "PRC classification tree" modifies two crucial stages in tree building.
arXiv Detail & Related papers (2020-11-15T22:31:06Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.