Interpretable Representation Learning for Additive Rule Ensembles
- URL: http://arxiv.org/abs/2506.20927v1
- Date: Thu, 26 Jun 2025 01:24:08 GMT
- Title: Interpretable Representation Learning for Additive Rule Ensembles
- Authors: Shahrzad Behzadimanesh, Pierre Le Bodic, Geoffrey I. Webb, Mario Boley,
- Abstract summary: Small additive ensembles of symbolic rules offer interpretable prediction models.<n>We extend classical rule ensembles by introducing learnable sparse linear transformations of input variables.<n>We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression.
- Score: 6.753916556870879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Small additive ensembles of symbolic rules offer interpretable prediction models. Traditionally, these ensembles use rule conditions based on conjunctions of simple threshold propositions $x \geq t$ on a single input variable $x$ and threshold $t$, resulting geometrically in axis-parallel polytopes as decision regions. While this form ensures a high degree of interpretability for individual rules and can be learned efficiently using the gradient boosting approach, it relies on having access to a curated set of expressive and ideally independent input features so that a small ensemble of axis-parallel regions can describe the target variable well. Absent such features, reaching sufficient accuracy requires increasing the number and complexity of individual rules, which diminishes the interpretability of the model. Here, we extend classical rule ensembles by introducing logical propositions with learnable sparse linear transformations of input variables, i.e., propositions of the form $\mathbf{x}^\mathrm{T}\mathbf{w} \geq t$, where $\mathbf{w}$ is a learnable sparse weight vector, enabling decision regions as general polytopes with oblique faces. We propose a learning method using sequential greedy optimization based on an iteratively reweighted formulation of logistic regression. Experimental results demonstrate that the proposed method efficiently constructs rule ensembles with the same test risk as state-of-the-art methods while significantly reducing model complexity across ten benchmark datasets.
Related papers
- Revisiting Randomization in Greedy Model Search [16.15551706774035]
We propose and analyze an ensemble of greedy forward selection estimators that are randomized by feature subsampling.<n>We design a novel implementation based on dynamic programming that greatly improves its computational efficiency.<n>Contrary to prevailing belief that randomized ensembling is analogous to shrinkage, we show that it can simultaneously reduce training error and degrees of freedom.
arXiv Detail & Related papers (2025-06-18T17:13:53Z) - The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective [55.15192437680943]
We study the sample complexity of online reinforcement learning in the general setting of nonlinear dynamical systems with continuous state and action spaces.<n>Our algorithm achieves a policy regret of $mathcalO(N epsilon2 + mathrmln(m(epsilon)/epsilon2)$, where $epsilon$ is the time horizon.<n>In the special case where the dynamics are parametrized by a compact and real-valued set of parameters, we prove a policy regret of $mathcalO(sqrt
arXiv Detail & Related papers (2025-01-27T10:01:28Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Sparsity via Sparse Group $k$-max Regularization [22.05774771336432]
In this paper, we propose a novel and concise regularization, namely the sparse group $k$-max regularization.
We verify the effectiveness and flexibility of the proposed method through numerical experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-02-13T14:41:28Z) - Obtaining Explainable Classification Models using Distributionally
Robust Optimization [12.511155426574563]
We study generalized linear models constructed using sets of feature value rules.
An inherent trade-off exists between rule set sparsity and its prediction accuracy.
We propose a new formulation to learn an ensemble of rule sets that simultaneously addresses these competing factors.
arXiv Detail & Related papers (2023-11-03T15:45:34Z) - Conformalization of Sparse Generalized Linear Models [2.1485350418225244]
Conformal prediction method estimates a confidence set for $y_n+1$ that is valid for any finite sample size.
Although attractive, computing such a set is computationally infeasible in most regression problems.
We show how our path-following algorithm accurately approximates conformal prediction sets.
arXiv Detail & Related papers (2023-07-11T08:36:12Z) - Learning Graphical Factor Models with Riemannian Optimization [70.13748170371889]
This paper proposes a flexible algorithmic framework for graph learning under low-rank structural constraints.
The problem is expressed as penalized maximum likelihood estimation of an elliptical distribution.
We leverage geometries of positive definite matrices and positive semi-definite matrices of fixed rank that are well suited to elliptical models.
arXiv Detail & Related papers (2022-10-21T13:19:45Z) - Object Representations as Fixed Points: Training Iterative Refinement
Algorithms with Implicit Differentiation [88.14365009076907]
Iterative refinement is a useful paradigm for representation learning.
We develop an implicit differentiation approach that improves the stability and tractability of training.
arXiv Detail & Related papers (2022-07-02T10:00:35Z) - Structured Reordering for Modeling Latent Alignments in Sequence
Transduction [86.94309120789396]
We present an efficient dynamic programming algorithm performing exact marginal inference of separable permutations.
The resulting seq2seq model exhibits better systematic generalization than standard models on synthetic problems and NLP tasks.
arXiv Detail & Related papers (2021-06-06T21:53:54Z) - Ridge regression with adaptive additive rectangles and other piecewise
functional templates [0.0]
We propose an $L_2$-based penalization algorithm for functional linear regression models.
We show how our algorithm alternates between approximating a suitable template and solving a convex ridge-like problem.
arXiv Detail & Related papers (2020-11-02T15:28:54Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.