Related papers: Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships

Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships

URL: http://arxiv.org/abs/2105.08626v1
Date: Wed, 28 Apr 2021 20:19:44 GMT
Title: Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships
Authors: Robert P. Sheridan, Andy Liaw, Matthew Tudor
Abstract summary: We compare Light Gradient Boosting Machine (LightGBM) to random forest, single-task deep neural nets, and Extreme Gradient Boosting (XGBoost) on 30 in-house data sets. LightGBM makes predictions about as accurate as single-task deep neural nets, but is a factor of 1000-fold faster than random forest and 4-fold faster than XGBoost.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the pharmaceutical industry, where it is common to generate many QSAR models with large numbers of molecules and descriptors, the best QSAR methods are those that can generate the most accurate predictions but that are also insensitive to hyperparameters and are computationally efficient. Here we compare Light Gradient Boosting Machine (LightGBM) to random forest, single-task deep neural nets, and Extreme Gradient Boosting (XGBoost) on 30 in-house data sets. While any boosting algorithm has many adjustable hyperparameters, we can define a set of standard hyperparameters at which LightGBM makes predictions about as accurate as single-task deep neural nets, but is a factor of 1000-fold faster than random forest and ~4-fold faster than XGBoost in terms of total computational time for the largest models. Another very useful feature of LightGBM is that it includes a native method for estimating prediction intervals.

Related papers

Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning? [42.362388367152256]
Large language models (LLMs) are used to fine-tune a parameter-efficient version of Code Llama using LoRA. Our method achieves competitive or superior results in terms of Root Mean Square Error (RMSE) while significantly reducing computational overhead.
arXiv Detail & Related papers (2025-04-08T13:15:47Z)
Training neural networks faster with minimal tuning using pre-computed lists of hyperparameters for NAdamW [11.681640186200951]
We present a set of practical and performant hyper parameter lists for NAdamW. Our best NAdamW hyper parameter list performs well on AlgoPerf held-out workloads not used to construct it. It also outperforms basic learning rate/weight decay sweeps and an off-the-shelf Bayesian optimization tool when restricted to the same budget.
arXiv Detail & Related papers (2025-03-06T00:14:50Z)
From Point to probabilistic gradient boosting for claim frequency and severity prediction [1.3812010983144802]
We present in a unified notation, and contrast, all the existing point and probabilistic gradient boosting for decision tree algorithms. We compare their performance on five publicly available datasets for claim frequency and severity. We find that there is no trade-off between model adequacy and predictive accuracy: both are achievable simultaneously.
arXiv Detail & Related papers (2024-12-19T14:50:10Z)
Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters. We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
Quantile Extreme Gradient Boosting for Uncertainty Quantification [1.7685947618629572]
Extreme Gradient Boosting (XGBoost) is one of the most popular machine learning (ML) methods. We propose enhancements to XGBoost whereby a modified quantile regression is used as the objective function to estimate uncertainty (QXGBoost) Our proposed method had comparable or better performance than the uncertainty estimates generated for regular and quantile light gradient boosting.
arXiv Detail & Related papers (2023-04-23T19:46:19Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization [0.0]
We present a novel algorithm which estimates the smoothness and length-scale parameters in the Matern kernel in order to improve robustness of the resulting prediction uncertainties. We achieve improved UQ over leave-one-out likelihood while maintaining a high degree of scalability as demonstrated in numerical experiments.
arXiv Detail & Related papers (2022-09-22T19:23:37Z)
Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates. The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z)
Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One [0.0]
We show that deep learnings using small constant learning rates, hyper parameters close to one, and large batch sizes can find the model parameters of deep neural networks that minimize the loss functions. Results indicate that Adam using a small constant learning rate, hyper parameters close to one, and the critical batch size minimizing SFO complexity has faster convergence than Momentum and gradient descent.
arXiv Detail & Related papers (2022-08-21T06:11:23Z)
AUTOMATA: Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning [72.54359545547904]
We propose a gradient-based subset selection framework for hyper- parameter tuning. We show that using gradient-based data subsets for hyper- parameter tuning achieves significantly faster turnaround times and speedups of 3$times$-30$times$.
arXiv Detail & Related papers (2022-03-15T19:25:01Z)
Towards Robust and Automatic Hyper-Parameter Tunning [39.04604349338802]
We introduce a new class of HPO method and explore how the low-rank factorization of intermediate layers of a convolutional network can be used to define an analytical response surface. We quantify how this surface behaves as a surrogate to model performance and can be solved using a trust-region search algorithm, which we call autoHyper.
arXiv Detail & Related papers (2021-11-28T05:27:34Z)
HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using Evolutionary Optimization [1.6114012813668934]
We propose HyP-ABC, an automatic hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned.
arXiv Detail & Related papers (2021-09-11T16:45:39Z)
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.