Related papers: Selecting Hyperparameters for Tree-Boosting

Selecting Hyperparameters for Tree-Boosting

URL: http://arxiv.org/abs/2602.05786v1
Date: Thu, 05 Feb 2026 15:44:42 GMT
Title: Selecting Hyperparameters for Tree-Boosting
Authors: Floris Jan Koster, Fabio Sigrist,
Abstract summary: Tree-boosting is a widely used machine learning technique for tabular data.<n>We empirically compare several popular methods for hyperparameter optimization for tree-boosting.<n>We find that the SMAC method clearly outperforms all the other considered methods.
Score: 6.789370732159177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.

Related papers

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [59.369484219304866]
We conduct an unprecedented empirical investigation training over 3,700 Large Language Models (LLMs) from scratch across 100 trillion tokens.<n>We establish a universal Scaling Law for hyperparameter optimization in LLM Pre-training, called Step Law.<n>Our estimated optima deviates from the global best performance found via exhaustive search by merely 0.094% on the test set.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Scaling Exponents Across Parameterizations and Optimizers [94.54718325264218]
We propose a new perspective on parameterization by investigating a key assumption in prior work. Our empirical investigation includes tens of thousands of models trained with all combinations of threes. We find that the best learning rate scaling prescription would often have been excluded by the assumptions in prior work.
arXiv Detail & Related papers (2024-07-08T12:32:51Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Automating DBSCAN via Deep Reinforcement Learning [73.82740568765279]
We propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process. The framework consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively.
arXiv Detail & Related papers (2022-08-09T04:40:11Z)
Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface. We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z)
Pre-trained Gaussian Processes for Bayesian Optimization [24.730678780782647]
We propose a new pre-training based BO framework named HyperBO. We show bounded posterior predictions and near-zero regrets for HyperBO without assuming the "ground truth" GP prior is known.
arXiv Detail & Related papers (2021-09-16T20:46:26Z)
HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using Evolutionary Optimization [1.6114012813668934]
We propose HyP-ABC, an automatic hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned.
arXiv Detail & Related papers (2021-09-11T16:45:39Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships [0.0]
We compare Light Gradient Boosting Machine (LightGBM) to random forest, single-task deep neural nets, and Extreme Gradient Boosting (XGBoost) on 30 in-house data sets. LightGBM makes predictions about as accurate as single-task deep neural nets, but is a factor of 1000-fold faster than random forest and 4-fold faster than XGBoost.
arXiv Detail & Related papers (2021-04-28T20:19:44Z)
How to tune the RBF SVM hyperparameters?: An empirical evaluation of 18 search algorithms [4.394728504061753]
We propose 18 proposed search algorithms for 115 real-life binary data sets. We find that Parss better searches with only a slight increase in time with respect to the same tree with with respect to the grid. We also find that there are no significant differences among the different procedures to the best set of data when more than one is found by the search algorithms.
arXiv Detail & Related papers (2020-08-26T16:28:48Z)
Weighted Random Search for CNN Hyperparameter Optimization [0.0]
We introduce the weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy. The criterion is the classification accuracy achieved within the same number of tested combinations of hyperparameter values. According to our experiments, the WRS algorithm outperforms the other methods.
arXiv Detail & Related papers (2020-03-30T09:40:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.