Rule Covering for Interpretation and Boosting
- URL: http://arxiv.org/abs/2007.06379v2
- Date: Sat, 19 Sep 2020 13:56:58 GMT
- Title: Rule Covering for Interpretation and Boosting
- Authors: S. Ilker Birbil, Mert Edali, Birol Yuceoglu
- Abstract summary: We propose two algorithms for interpretation and boosting of tree-based ensemble methods.
First algorithm uses the collection of decision trees obtained from a trained random forest model.
Second algorithm uses a rule generation scheme for boosting decision trees.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose two algorithms for interpretation and boosting of tree-based
ensemble methods. Both algorithms make use of mathematical programming models
that are constructed with a set of rules extracted from an ensemble of decision
trees. The objective is to obtain the minimum total impurity with the least
number of rules that cover all the samples. The first algorithm uses the
collection of decision trees obtained from a trained random forest model. Our
numerical results show that the proposed rule covering approach selects only a
few rules that could be used for interpreting the random forest model.
Moreover, the resulting set of rules closely matches the accuracy level of the
random forest model. Inspired by the column generation algorithm in linear
programming, our second algorithm uses a rule generation scheme for boosting
decision trees. We use the dual optimal solutions of the linear programming
models as sample weights to obtain only those rules that would improve the
accuracy. With a computational study, we observe that our second algorithm
performs competitively with the other well-known boosting methods. Our
implementations also demonstrate that both algorithms can be trivially coupled
with the existing random forest and decision tree packages.
Related papers
- Bregman-divergence-based Arimoto-Blahut algorithm [53.64687146666141]
We generalize the Arimoto-Blahut algorithm to a general function defined over Bregman-divergence system.
We propose a convex-optimization-free algorithm that can be applied to classical and quantum rate-distortion theory.
arXiv Detail & Related papers (2024-08-10T06:16:24Z) - A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming [2.1408617023874443]
Tree ensemble methods are known for their effectiveness in supervised classification and regression tasks.
Our work aims to extract an optimized list of rules from a trained tree ensemble, providing the user with a condensed, interpretable model.
arXiv Detail & Related papers (2024-06-30T22:33:47Z) - Learning Decision Trees and Forests with Algorithmic Recourse [11.401006371457436]
Algorithmic Recourse (AR) aims to provide a recourse action for altering the undesired prediction result given by a model.
We formulate the task of learning an accurate classification tree under the constraint of ensuring the existence of reasonable actions for as many instances as possible.
arXiv Detail & Related papers (2024-06-03T08:33:42Z) - Optimal estimation of Gaussian (poly)trees [25.02920605955238]
We consider both problems of distribution learning (i.e. in KL distance) and structure learning (i.e. exact recovery)
The first approach is based on the Chow-Liu algorithm, and learns an optimal tree-structured distribution efficiently.
The second approach is a modification of the PC algorithm for polytrees that uses partial correlation as a conditional independence tester for constraint-based structure learning.
arXiv Detail & Related papers (2024-02-09T12:58:36Z) - On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards.
For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Explaining random forest prediction through diverse rulesets [0.0]
Local Tree eXtractor (LTreeX) is able to explain the forest prediction for a given test instance with a few diverse rules.
We show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
arXiv Detail & Related papers (2022-03-29T12:54:57Z) - E2E-FS: An End-to-End Feature Selection Method for Neural Networks [0.3222802562733786]
We present a novel selection algorithm, called EndtoEnd Feature Selection (E2FS)
Our algorithm, similar to the lasso approach, is solved with gradient descent techniques.
Although hard restrictions, experimental results show that this algorithm can be used with any learning model.
arXiv Detail & Related papers (2020-12-14T16:19:25Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.