Accelerated Componentwise Gradient Boosting using Efficient Data
Representation and Momentum-based Optimization
- URL: http://arxiv.org/abs/2110.03513v1
- Date: Thu, 7 Oct 2021 14:49:52 GMT
- Title: Accelerated Componentwise Gradient Boosting using Efficient Data
Representation and Momentum-based Optimization
- Authors: Daniel Schalk, Bernd Bischl and David R\"ugamer
- Abstract summary: Componentwise boosting (CWB) builds on additive models as base learners to ensure interpretability.
One downside of CWB is its computational complexity in terms of memory and runtime.
We propose two techniques to overcome these issues without losing the properties of CWB.
- Score: 1.3159777131162964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Componentwise boosting (CWB), also known as model-based boosting, is a
variant of gradient boosting that builds on additive models as base learners to
ensure interpretability. CWB is thus often used in research areas where models
are employed as tools to explain relationships in data. One downside of CWB is
its computational complexity in terms of memory and runtime. In this paper, we
propose two techniques to overcome these issues without losing the properties
of CWB: feature discretization of numerical features and incorporating Nesterov
momentum into functional gradient descent. As the latter can be prone to early
overfitting, we also propose a hybrid approach that prevents a possibly
diverging gradient descent routine while ensuring faster convergence. We
perform extensive benchmarks on multiple simulated and real-world data sets to
demonstrate the improvements in runtime and memory consumption while
maintaining state-of-the-art estimation and prediction performance.
Related papers
- Accelerating AI Performance using Anderson Extrapolation on GPUs [2.114333871769023]
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation.
By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence.
We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
arXiv Detail & Related papers (2024-10-25T10:45:17Z) - A Functional Extension of Semi-Structured Networks [2.482050942288848]
Semi-structured networks (SSNs) merge structures familiar from additive models with deep neural networks.
Inspired by large-scale datasets, this paper explores extending SSNs to functional data.
We propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability.
arXiv Detail & Related papers (2024-10-07T18:50:18Z) - Decreasing the Computing Time of Bayesian Optimization using
Generalizable Memory Pruning [56.334116591082896]
We show a wrapper of memory pruning and bounded optimization capable of being used with any surrogate model and acquisition function.
Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity.
All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
arXiv Detail & Related papers (2023-09-08T14:05:56Z) - Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings.
This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Convergent Boosted Smoothing for Modeling Graph Data with Tabular Node
Features [46.052312251801]
We propose a framework for iterating boosting with graph propagation steps.
Our approach is anchored in a principled meta loss function.
Across a variety of non-iid graph datasets, our method achieves comparable or superior performance.
arXiv Detail & Related papers (2021-10-26T04:53:12Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - High-Dimensional Bayesian Optimization via Tree-Structured Additive
Models [40.497123136157946]
We consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function.
Our goal is to lower the computational resources required and facilitate faster model learning.
We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.
arXiv Detail & Related papers (2020-12-24T03:56:44Z) - Population Gradients improve performance across data-sets and
architectures in object classification [6.17047113475566]
We present a new method to calculate the gradients while training Neural Networks (NNs)
It significantly improves final performance across architectures, data-sets, hyper- parameter values, training length, and model sizes.
Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) is as high or higher than this one of all the other widespread performance-improving methods.
arXiv Detail & Related papers (2020-10-23T09:40:23Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.