Related papers: Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems

Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems

URL: http://arxiv.org/abs/2008.00410v1
Date: Sun, 2 Aug 2020 06:35:32 GMT
Title: Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems
Authors: Yashesh Dhebar and Kalyanmoy Deb
Abstract summary: We represent a classifier as an assembly of simple mathematical rules using a non-linear decision tree (NLDT) By restricting the structure of split-rule at each conditional node and depth of the decision tree, the interpretability of the classifier is assured. Results on two to 500-feature problems are encouraging and open up further scopes of applying the proposed approach to more challenging and complex classification tasks.
Score: 7.8140593450932965
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For supervised classification problems involving design, control, other practical purposes, users are not only interested in finding a highly accurate classifier, but they also demand that the obtained classifier be easily interpretable. While the definition of interpretability of a classifier can vary from case to case, here, by a humanly interpretable classifier we restrict it to be expressed in simplistic mathematical terms. As a novel approach, we represent a classifier as an assembly of simple mathematical rules using a non-linear decision tree (NLDT). Each conditional (non-terminal) node of the tree represents a non-linear mathematical rule (split-rule) involving features in order to partition the dataset in the given conditional node into two non-overlapping subsets. This partitioning is intended to minimize the impurity of the resulting child nodes. By restricting the structure of split-rule at each conditional node and depth of the decision tree, the interpretability of the classifier is assured. The non-linear split-rule at a given conditional node is obtained using an evolutionary bilevel optimization algorithm, in which while the upper-level focuses on arriving at an interpretable structure of the split-rule, the lower-level achieves the most appropriate weights (coefficients) of individual constituents of the rule to minimize the net impurity of two resulting child nodes. The performance of the proposed algorithm is demonstrated on a number of controlled test problems, existing benchmark problems, and industrial problems. Results on two to 500-feature problems are encouraging and open up further scopes of applying the proposed approach to more challenging and complex classification tasks.

Related papers

A Fast Interpretable Fuzzy Tree Learner [2.564905016909138]
Fuzzy rule-based systems have been mostly used in interpretable decision-making.<n> interpretability requires both sensible linguistic partitions and small rule-base sizes.<n>We propose an adaptation of classical tree-based splitting algorithms from crisp rules to fuzzy trees.
arXiv Detail & Related papers (2025-12-12T14:51:07Z)
Decision Tree Embedding by Leaf-Means [11.318593165494724]
Decision Tree Embedding (DTE) is a fast and effective method that leverages the leaf partitions of a trained classification tree to construct an interpretable feature representation.<n>By using the sample means within each leaf region as anchor points, DTE maps inputs into an embedding space defined by the tree's partition structure.<n>We establish several population-level theoretical properties of DTE, including its preservation of conditional density under mild conditions.
arXiv Detail & Related papers (2025-12-01T15:57:33Z)
FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization [0.0]
We introduce a fast semi-supervised embedding framework that jointly optimize three complementary objectives.<n>On standard benchmarks, our method consistently achieves classification accuracy at par with or superior to state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-13T10:39:58Z)
Compact Rule-Based Classifier Learning via Gradient Descent [0.7874708385247353]
Rule-based models play a crucial role in scenarios that require transparency and accountable decision-making. We introduce a new rule-based classifier trained using gradient descent, in which the user can control the maximum number and length of the rules. For numerical partitions, the user can also control the partitions used with fuzzy sets, which also helps keep the number of partitions small.
arXiv Detail & Related papers (2025-02-03T14:13:39Z)
Conformal Prediction in Hierarchical Classification [18.730305100193927]
We extend the split conformal prediction framework to hierarchical classification, where prediction sets are commonly restricted to internal nodes of a predefined hierarchy. The first algorithm returns internal nodes as prediction sets, while the second relaxes this restriction, using the notion of complexity. Empirical evaluations on several benchmark datasets demonstrate the effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2025-01-31T11:10:19Z)
Obtaining Explainable Classification Models using Distributionally Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules. An inherent trade-off exists between rule set sparsity and its prediction accuracy. We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z)
Efficient Link Prediction via GNN Layers Induced by Negative Sampling [92.05291395292537]
Graph neural networks (GNNs) for link prediction can loosely be divided into two broad categories. First, emphnode-wise architectures pre-compute individual embeddings for each node that are later combined by a simple decoder to make predictions. Second, emphedge-wise methods rely on the formation of edge-specific subgraph embeddings to enrich the representation of pair-wise relationships.
arXiv Detail & Related papers (2023-10-14T07:02:54Z)
One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering [56.81492360414741]
We show how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity. In this paper, we first characterize a novel one-step bipartite graph cut criterion with normalized constraints, and theoretically prove its equivalence to a trace problem. We extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled.
arXiv Detail & Related papers (2023-05-12T11:27:20Z)
Regularized impurity reduction: Accurate decision trees with complexity guarantees [20.170305081348328]
We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. The enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.
arXiv Detail & Related papers (2022-08-23T13:15:19Z)
Set-valued prediction in hierarchical classification with constrained representation complexity [4.258263831866309]
We focus on hierarchical multi-class classification problems, where valid sets correspond to internal nodes of the hierarchy. We propose three methods and evaluate them on benchmark datasets.
arXiv Detail & Related papers (2022-03-13T15:13:19Z)
Multiple Classifiers Based Maximum Classifier Discrepancy for Unsupervised Domain Adaptation [25.114533037440896]
We propose to extend the structure of two classifiers to multiple classifiers to further boost its performance. We demonstrate that, on average, adopting the structure of three classifiers normally yields the best performance as a trade-off between the accuracy and efficiency.
arXiv Detail & Related papers (2021-08-02T03:00:13Z)
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point. The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data. In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem. We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)
Classifier-independent Lower-Bounds for Adversarial Robustness [13.247278149124757]
We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification. We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem. We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks.
arXiv Detail & Related papers (2020-06-17T16:46:39Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.