Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules
of Nonlinear Decision Trees for Classification Problems
- URL: http://arxiv.org/abs/2008.00410v1
- Date: Sun, 2 Aug 2020 06:35:32 GMT
- Title: Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules
of Nonlinear Decision Trees for Classification Problems
- Authors: Yashesh Dhebar and Kalyanmoy Deb
- Abstract summary: We represent a classifier as an assembly of simple mathematical rules using a non-linear decision tree (NLDT)
By restricting the structure of split-rule at each conditional node and depth of the decision tree, the interpretability of the classifier is assured.
Results on two to 500-feature problems are encouraging and open up further scopes of applying the proposed approach to more challenging and complex classification tasks.
- Score: 7.8140593450932965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For supervised classification problems involving design, control, other
practical purposes, users are not only interested in finding a highly accurate
classifier, but they also demand that the obtained classifier be easily
interpretable. While the definition of interpretability of a classifier can
vary from case to case, here, by a humanly interpretable classifier we restrict
it to be expressed in simplistic mathematical terms. As a novel approach, we
represent a classifier as an assembly of simple mathematical rules using a
non-linear decision tree (NLDT). Each conditional (non-terminal) node of the
tree represents a non-linear mathematical rule (split-rule) involving features
in order to partition the dataset in the given conditional node into two
non-overlapping subsets. This partitioning is intended to minimize the impurity
of the resulting child nodes. By restricting the structure of split-rule at
each conditional node and depth of the decision tree, the interpretability of
the classifier is assured. The non-linear split-rule at a given conditional
node is obtained using an evolutionary bilevel optimization algorithm, in which
while the upper-level focuses on arriving at an interpretable structure of the
split-rule, the lower-level achieves the most appropriate weights
(coefficients) of individual constituents of the rule to minimize the net
impurity of two resulting child nodes. The performance of the proposed
algorithm is demonstrated on a number of controlled test problems, existing
benchmark problems, and industrial problems. Results on two to 500-feature
problems are encouraging and open up further scopes of applying the proposed
approach to more challenging and complex classification tasks.
Related papers
- A Fast Interpretable Fuzzy Tree Learner [2.564905016909138]
Fuzzy rule-based systems have been mostly used in interpretable decision-making.<n> interpretability requires both sensible linguistic partitions and small rule-base sizes.<n>We propose an adaptation of classical tree-based splitting algorithms from crisp rules to fuzzy trees.
arXiv Detail & Related papers (2025-12-12T14:51:07Z) - Decision Tree Embedding by Leaf-Means [11.318593165494724]
Decision Tree Embedding (DTE) is a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation.<n>By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure.<n>We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions.
arXiv Detail & Related papers (2025-12-01T15:57:33Z) - FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization [0.0]
We introduce a fast semi-supervised embedding framework that jointly optimize three complementary objectives.<n>On standard benchmarks, our method consistently achieves classification accuracy at par with or superior to state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-13T10:39:58Z) - Compact Rule-Based Classifier Learning via Gradient Descent [0.7874708385247353]
Rule-based models play a crucial role in scenarios that require transparency and accountable decision-making.
We introduce a new rule-based classifier trained using gradient descent, in which the user can control the maximum number and length of the rules.
For numerical partitions, the user can also control the partitions used with fuzzy sets, which also helps keep the number of partitions small.
arXiv Detail & Related papers (2025-02-03T14:13:39Z) - Conformal Prediction in Hierarchical Classification [18.730305100193927]
We extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy.
The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of complexity.
Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2025-01-31T11:10:19Z) - Obtaining Explainable Classification Models using Distributionally
Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules.
An inherent trade-off exists between rule set sparsity and its prediction accuracy.
We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z) - Efficient Link Prediction via GNN Layers Induced by Negative Sampling [92.05291395292537]
Graph neural networks (GNNs) for link prediction can loosely be divided into two broad categories.
First, emphnode-wise architectures pre-compute individual embeddings for each node that are later combined by a simple decoder to make predictions.
Second, emphedge-wise methods rely on the formation of edge-specific subgraph embeddings to enrich the representation of pair-wise relationships.
arXiv Detail & Related papers (2023-10-14T07:02:54Z) - One-step Bipartite Graph Cut: A Normalized Formulation and Its
Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity.
In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem.
We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z) - Regularized impurity reduction: Accurate decision trees with complexity
guarantees [20.170305081348328]
We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity.
The enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.
arXiv Detail & Related papers (2022-08-23T13:15:19Z) - Set-valued prediction in hierarchical classification with constrained
representation complexity [4.258263831866309]
We focus on hierarchical multi-class classification problems, where valid sets correspond to internal nodes of the hierarchy.
We propose three methods and evaluate them on benchmark datasets.
arXiv Detail & Related papers (2022-03-13T15:13:19Z) - Multiple Classifiers Based Maximum Classifier Discrepancy for
Unsupervised Domain Adaptation [25.114533037440896]
We propose to extend the structure of two classifiers to multiple classifiers to further boost its performance.
We demonstrate that, on average, adopting the structure of three classifiers normally yields the best performance as a trade-off between the accuracy and efficiency.
arXiv Detail & Related papers (2021-08-02T03:00:13Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification.
We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem.
We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z) - Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers.
We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.