Better Short than Greedy: Interpretable Models through Optimal Rule
Boosting
- URL: http://arxiv.org/abs/2101.08380v1
- Date: Thu, 21 Jan 2021 01:03:48 GMT
- Title: Better Short than Greedy: Interpretable Models through Optimal Rule
Boosting
- Authors: Mario Boley and Simon Teshuva and Pierre Le Bodic and Geoffrey I Webb
- Abstract summary: Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability.
We present a novel approach aiming to fit rule ensembles of maximal predictive power for a given ensemble size.
- Score: 10.938624307941197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rule ensembles are designed to provide a useful trade-off between predictive
accuracy and model interpretability. However, the myopic and random search
components of current rule ensemble methods can compromise this goal: they
often need more rules than necessary to reach a certain accuracy level or can
even outright fail to accurately model a distribution that can actually be
described well with a few rules. Here, we present a novel approach aiming to
fit rule ensembles of maximal predictive power for a given ensemble size (and
thus model comprehensibility). In particular, we present an efficient
branch-and-bound algorithm that optimally solves the per-rule objective
function of the popular second-order gradient boosting framework. Our main
insight is that the boosting objective can be tightly bounded in linear time of
the number of covered data points. Along with an additional novel pruning
technique related to rule redundancy, this leads to a computationally feasible
approach for boosting optimal rules that, as we demonstrate on a wide range of
common benchmark problems, consistently outperforms the predictive performance
of boosting greedy rules.
Related papers
- Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles [10.40809014729148]
Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models.
We show how a new objective function measures the angle between the risk gradient vector and the projection of the condition output vector onto the complement of the already selected conditions.
This approach correctly approximates the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules.
arXiv Detail & Related papers (2024-02-24T02:29:10Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Obtaining Explainable Classification Models using Distributionally
Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules.
An inherent trade-off exists between rule set sparsity and its prediction accuracy.
We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z) - CGXplain: Rule-Based Deep Neural Network Explanations Using Dual Linear
Programs [4.632241550169363]
Rule-based surrogate models are an effective way to approximate a Deep Neural Network's (DNN) decision boundaries.
This paper introduces the CGX (Column Generation eXplainer) to address these limitations.
arXiv Detail & Related papers (2023-04-11T13:16:26Z) - Towards Target Sequential Rules [52.4562332499155]
We propose an efficient algorithm, called targeted sequential rule mining (TaSRM)
It is shown that the novel algorithm TaSRM and its variants can achieve better experimental performance compared to the existing baseline algorithm.
arXiv Detail & Related papers (2022-06-09T18:59:54Z) - Bayes Point Rule Set Learning [5.065947993017157]
Interpretability is having an increasingly important role in the design of machine learning algorithms.
Disjunctive Normal Forms are arguably the most interpretable way to express a set of rules.
We propose an effective bottom-up extension of the popular FIND-S algorithm to learn DNF-type rulesets.
arXiv Detail & Related papers (2022-04-11T16:50:41Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Interpretable and Fair Boolean Rule Sets via Column Generation [18.08486863429421]
An integer program is formulated to optimally trade classification accuracy for rule simplicity.
We consider the fairness setting and extend the formulation to include explicit constraints on two different measures of classification parity.
Compared to other fair and interpretable classifiers, our method is able to find rule sets that meet stricter notions of fairness with a modest trade-off in accuracy.
arXiv Detail & Related papers (2021-11-16T13:40:28Z) - Discovering Useful Compact Sets of Sequential Rules in a Long Sequence [57.684967309375274]
COSSU is an algorithm to mine small and meaningful sets of sequential rules.
We show that COSSU can successfully retrieve relevant sets of closed sequential rules from a long sequence.
arXiv Detail & Related papers (2021-09-15T18:25:18Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.