Related papers: A better method to enforce monotonic constraints in regression and classification trees

A better method to enforce monotonic constraints in regression and classification trees

URL: http://arxiv.org/abs/2011.00986v1
Date: Mon, 2 Nov 2020 14:04:21 GMT
Title: A better method to enforce monotonic constraints in regression and classification trees
Authors: Charles Auguste (IMI), Sean Malory, Ivan Smirnov
Abstract summary: We present two new ways of enforcing monotone constraints in regression and classification trees. One yields better results than the current LightGBM, and has a similar computation time. The other one yields even better results, but is much slower than the current LightGBM.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this report we present two new ways of enforcing monotone constraints in regression and classification trees. One yields better results than the current LightGBM, and has a similar computation time. The other one yields even better results, but is much slower than the current LightGBM. We also propose a heuristic that takes into account that greedily splitting a tree by choosing a monotone split with respect to its immediate gain is far from optimal. Then, we compare the results with the current implementation of the constraints in the LightGBM library, using the well known Adult public dataset. Throughout the report, we mostly focus on the implementation of our methods that we made for the LightGBM library, even though they are general and could be implemented in any regression or classification tree. The best method we propose (a smarter way to split the tree coupled to a penalization of monotone splits) consistently beats the current implementation of LightGBM. With small or average trees, the loss reduction can be as high as 1% in the early stages of training and decreases to around 0.1% at the loss peak for the Adult dataset. The results would be even better with larger trees. In our experiments, we didn't do a lot of tuning of the regularization parameters, and we wouldn't be surprised to see that increasing the performance of our methods on test sets.

Related papers

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles [45.962492329047215]
We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles.
arXiv Detail & Related papers (2024-08-28T23:15:46Z)
Adapting tree-based multiple imputation methods for multi-level data? A simulation study [0.0]
This simulation study evaluates the effectiveness of multiple imputation techniques for multilevel data. It compares the performance of traditional Multiple Imputation by Chained Equations (MICE) with tree-based methods.
arXiv Detail & Related papers (2024-01-25T13:12:50Z)
Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing. We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z)
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing [78.92205914422925]
We propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST) We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority" Experimental results on several data sets demonstrate the power of the algorithm.
arXiv Detail & Related papers (2023-03-02T09:04:35Z)
Scalable Optimal Multiway-Split Decision Trees with Constraints [3.092691764363848]
We present a novel path-based MIP formulation where the number of decision variables is independent of $N$. Our framework produces a multiway-split tree which is more interpretable than the typical binary-split trees due to its shorter rules. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points.
arXiv Detail & Related papers (2023-02-14T03:48:48Z)
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account. We show for the first time the role of a tree expansion policy in mitigating this variance. Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
SoftTreeMax: Policy Gradient with Tree Search [72.9513807133171]
We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
arXiv Detail & Related papers (2022-09-28T09:55:47Z)
Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z)
An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs) We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach. Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.