A Nested Genetic Algorithm for Explaining Classification Data Sets with
Decision Rules
- URL: http://arxiv.org/abs/2209.07575v1
- Date: Tue, 23 Aug 2022 11:42:15 GMT
- Title: A Nested Genetic Algorithm for Explaining Classification Data Sets with
Decision Rules
- Authors: Paul-Amaury Matt, Rosina Ziegler, Danilo Brajovic, Marco Roth and
Marco F. Huber
- Abstract summary: We aim to automatically extract a set of decision rules (rule set) that best explains a classification data set.
First, a large set of decision rules is extracted from a set of decision trees trained on the data set.
The rule set should be concise, accurate, have a maximum coverage and minimum number of inconsistencies.
- Score: 3.8271082752302137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our goal in this paper is to automatically extract a set of decision rules
(rule set) that best explains a classification data set. First, a large set of
decision rules is extracted from a set of decision trees trained on the data
set. The rule set should be concise, accurate, have a maximum coverage and
minimum number of inconsistencies. This problem can be formalized as a modified
version of the weighted budgeted maximum coverage problem, known to be NP-hard.
To solve the combinatorial optimization problem efficiently, we introduce a
nested genetic algorithm which we then use to derive explanations for ten
public data sets.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Concomitant Group Testing [49.50984893039441]
We introduce a variation of the group testing problem capturing the idea that a positive test requires a combination of multiple types'' of item.
The goal is to reliably identify all of the semi-defective sets using as few tests as possible.
Our algorithms are distinguished by (i) whether they are deterministic (zero-error) or randomized (small-error), and (ii) whether they are non-adaptive, fully adaptive, or have limited adaptivity.
arXiv Detail & Related papers (2023-09-08T09:11:12Z) - FIRE: An Optimization Approach for Fast Interpretable Rule Extraction [7.538482310185135]
We present FIRE, Fast Interpretable Rule Extraction, an optimization-based framework to extract a small collection of decision rules from tree ensembles.
We show in our experiments that FIRE outperforms state-of-the-art ensemble algorithms at building sparse rule sets, and can deliver more interpretable models compared to existing methods.
arXiv Detail & Related papers (2023-06-12T21:27:39Z) - Efficient learning of large sets of locally optimal classification rules [0.0]
Conventional rule learning algorithms aim at finding a set of simple rules, where each rule covers as many examples as possible.
In this paper, we argue that the rules found in this way may not be the optimal explanations for each of the examples they cover.
We propose an efficient algorithm that aims at finding the best rule covering each training example in a greedy optimization consisting of one specialization and one generalization loop.
arXiv Detail & Related papers (2023-01-24T11:40:28Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Interpretable and Fair Boolean Rule Sets via Column Generation [18.08486863429421]
An integer program is formulated to optimally trade classification accuracy for rule simplicity.
We consider the fairness setting and extend the formulation to include explicit constraints on two different measures of classification parity.
Compared to other fair and interpretable classifiers, our method is able to find rule sets that meet stricter notions of fairness with a modest trade-off in accuracy.
arXiv Detail & Related papers (2021-11-16T13:40:28Z) - Certifiably Polynomial Algorithm for Best Group Subset Selection [0.9667631210393929]
Best group subset selection aims to choose a small part of non-overlapping groups to achieve the best interpretability on the response variable.
We propose a group-splicing algorithm that iteratively detects effective groups and excludes the helpless ones.
We demonstrate the efficiency and accuracy of our proposal by comparing state-of-the-art algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-04-23T03:05:11Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Extreme Algorithm Selection With Dyadic Feature Representation [78.13985819417974]
We propose the setting of extreme algorithm selection (XAS) where we consider fixed sets of thousands of candidate algorithms.
We assess the applicability of state-of-the-art AS techniques to the XAS setting and propose approaches leveraging a dyadic feature representation.
arXiv Detail & Related papers (2020-01-29T09:40:58Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.