Trading Complexity for Sparsity in Random Forest Explanations
- URL: http://arxiv.org/abs/2108.05276v1
- Date: Wed, 11 Aug 2021 15:19:46 GMT
- Title: Trading Complexity for Sparsity in Random Forest Explanations
- Authors: Gilles Audemard and Steve Bellart and Louenas Bounia and Fr\'ed\'eric
Koriche and Jean-Marie Lagniez and Pierre Marquis
- Abstract summary: We introduce majoritary reasons which are prime implicants of a strict majority of decision trees.
Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity.
- Score: 20.87501058448681
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Random forests have long been considered as powerful model ensembles in
machine learning. By training multiple decision trees, whose diversity is
fostered through data and feature subsampling, the resulting random forest can
lead to more stable and reliable predictions than a single decision tree. This
however comes at the cost of decreased interpretability: while decision trees
are often easily interpretable, the predictions made by random forests are much
more difficult to understand, as they involve a majority vote over hundreds of
decision trees. In this paper, we examine different types of reasons that
explain "why" an input instance is classified as positive or negative by a
Boolean random forest. Notably, as an alternative to sufficient reasons taking
the form of prime implicants of the random forest, we introduce majoritary
reasons which are prime implicants of a strict majority of decision trees. For
these different abductive explanations, the tractability of the generation
problem (finding one reason) and the minimization problem (finding one shortest
reason) are investigated. Experiments conducted on various datasets reveal the
existence of a trade-off between runtime complexity and sparsity. Sufficient
reasons - for which the identification problem is DP-complete - are slightly
larger than majoritary reasons that can be generated using a simple linear-
time greedy algorithm, and significantly larger than minimal majoritary reasons
that can be approached using an anytime P ARTIAL M AX SAT algorithm.
Related papers
- Distilling interpretable causal trees from causal forests [0.0]
A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates.
This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest.
arXiv Detail & Related papers (2024-08-02T05:48:15Z) - A New Random Forest Ensemble of Intuitionistic Fuzzy Decision Trees [5.831659043074847]
We propose a new random forest ensemble of intuitionistic fuzzy decision trees (IFDT)
The proposed method enjoys the power of the randomness from bootstrapped sampling and feature selection.
This study is the first to propose a random forest ensemble based on the intuitionistic fuzzy theory.
arXiv Detail & Related papers (2024-03-12T06:52:24Z) - Why do Random Forests Work? Understanding Tree Ensembles as
Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles.
We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z) - Probabilistic Tree-of-thought Reasoning for Answering
Knowledge-intensive Complex Questions [93.40614719648386]
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning.
Recent works turn to retrieving external knowledge to augment CoT reasoning.
We propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree)
arXiv Detail & Related papers (2023-11-23T12:52:37Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Explaining Random Forests using Bipolar Argumentation and Markov
Networks (Technical Report) [17.9926469947157]
Random forests are decision tree ensembles that can be used to solve a variety of machine learning problems.
In order to reason about the decision process, we propose representing it as an argumentation problem.
We generalize sufficient and necessary argumentative explanations using a Markov network encoding, discuss the relevance of these explanations and establish relationships to families of abductive explanations from the literature.
arXiv Detail & Related papers (2022-11-21T18:20:50Z) - Search Methods for Sufficient, Socially-Aligned Feature Importance
Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time.
We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z) - Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a
Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure.
In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Trees, forests, and impurity-based variable importance [0.0]
We analyze one of the two well-known random forest variable importances, the Mean Decrease Impurity (MDI)
We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output.
Our analysis shows that there may exist some benefits to use a forest compared to a single tree.
arXiv Detail & Related papers (2020-01-13T14:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.