Related papers: Trading Complexity for Sparsity in Random Forest Explanations

Trading Complexity for Sparsity in Random Forest Explanations

URL: http://arxiv.org/abs/2108.05276v1
Date: Wed, 11 Aug 2021 15:19:46 GMT
Title: Trading Complexity for Sparsity in Random Forest Explanations
Authors: Gilles Audemard and Steve Bellart and Louenas Bounia and Fr\'ed\'eric Koriche and Jean-Marie Lagniez and Pierre Marquis
Abstract summary: We introduce majoritary reasons which are prime implicants of a strict majority of decision trees. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity.
Score: 20.87501058448681
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Random forests have long been considered as powerful model ensembles in machine learning. By training multiple decision trees, whose diversity is fostered through data and feature subsampling, the resulting random forest can lead to more stable and reliable predictions than a single decision tree. This however comes at the cost of decreased interpretability: while decision trees are often easily interpretable, the predictions made by random forests are much more difficult to understand, as they involve a majority vote over hundreds of decision trees. In this paper, we examine different types of reasons that explain "why" an input instance is classified as positive or negative by a Boolean random forest. Notably, as an alternative to sufficient reasons taking the form of prime implicants of the random forest, we introduce majoritary reasons which are prime implicants of a strict majority of decision trees. For these different abductive explanations, the tractability of the generation problem (finding one reason) and the minimization problem (finding one shortest reason) are investigated. Experiments conducted on various datasets reveal the existence of a trade-off between runtime complexity and sparsity. Sufficient reasons - for which the identification problem is DP-complete - are slightly larger than majoritary reasons that can be generated using a simple linear- time greedy algorithm, and significantly larger than minimal majoritary reasons that can be approached using an anytime P ARTIAL M AX SAT algorithm.

Related papers

Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? [72.70486097967124]
We formalize a framework using deterministic finite automata (DFAs) We show that there exists an optimal amount of reasoning tokens such that the probability of producing a correct solution is maximized. We then demonstrate an implication of these findings: being able to predict the optimal number of reasoning tokens for new problems and filtering out non-optimal length answers results in consistent accuracy improvements.
arXiv Detail & Related papers (2025-04-02T17:45:58Z)
Distilling interpretable causal trees from causal forests [0.0]
A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest.
arXiv Detail & Related papers (2024-08-02T05:48:15Z)
A New Random Forest Ensemble of Intuitionistic Fuzzy Decision Trees [5.831659043074847]
We propose a new random forest ensemble of intuitionistic fuzzy decision trees (IFDT) The proposed method enjoys the power of the randomness from bootstrapped sampling and feature selection. This study is the first to propose a random forest ensemble based on the intuitionistic fuzzy theory.
arXiv Detail & Related papers (2024-03-12T06:52:24Z)
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers [68.76846801719095]
We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles. We show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled.
arXiv Detail & Related papers (2024-02-02T15:36:43Z)
Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions [93.40614719648386]
Large language models (LLMs) are capable of answering knowledge-intensive complex questions with chain-of-thought (CoT) reasoning. Recent works turn to retrieving external knowledge to augment CoT reasoning. We propose a novel approach: Probabilistic Tree-of-thought Reasoning (ProbTree)
arXiv Detail & Related papers (2023-11-23T12:52:37Z)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions. Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z)
Explaining Random Forests using Bipolar Argumentation and Markov Networks (Technical Report) [17.9926469947157]
Random forests are decision tree ensembles that can be used to solve a variety of machine learning problems. In order to reason about the decision process, we propose representing it as an argumentation problem. We generalize sufficient and necessary argumentative explanations using a Markov network encoding, discuss the relevance of these explanations and establish relationships to families of abductive explanations from the literature.
arXiv Detail & Related papers (2022-11-21T18:20:50Z)
Search Methods for Sufficient, Socially-Aligned Feature Importance Explanations with In-Distribution Counterfactuals [72.00815192668193]
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. We study several under-explored dimensions of FI-based explanations, providing conceptual and empirical improvements for this form of explanation.
arXiv Detail & Related papers (2021-06-01T20:36:48Z)
Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest [8.513154770491898]
We argue that tree depth should be seen as a natural form of regularization across the entire procedure. In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low.
arXiv Detail & Related papers (2021-03-30T21:57:55Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs. We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z)
Trees, forests, and impurity-based variable importance [0.0]
We analyze one of the two well-known random forest variable importances, the Mean Decrease Impurity (MDI) We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output. Our analysis shows that there may exist some benefits to use a forest compared to a single tree.
arXiv Detail & Related papers (2020-01-13T14:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.