Related papers: Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization

Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization

URL: http://arxiv.org/abs/2110.03513v1
Date: Thu, 7 Oct 2021 14:49:52 GMT
Title: Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization
Authors: Daniel Schalk, Bernd Bischl and David R\"ugamer
Abstract summary: Componentwise boosting (CWB) builds on additive models as base learners to ensure interpretability. One downside of CWB is its computational complexity in terms of memory and runtime. We propose two techniques to overcome these issues without losing the properties of CWB.
Score: 1.3159777131162964
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Componentwise boosting (CWB), also known as model-based boosting, is a variant of gradient boosting that builds on additive models as base learners to ensure interpretability. CWB is thus often used in research areas where models are employed as tools to explain relationships in data. One downside of CWB is its computational complexity in terms of memory and runtime. In this paper, we propose two techniques to overcome these issues without losing the properties of CWB: feature discretization of numerical features and incorporating Nesterov momentum into functional gradient descent. As the latter can be prone to early overfitting, we also propose a hybrid approach that prevents a possibly diverging gradient descent routine while ensuring faster convergence. We perform extensive benchmarks on multiple simulated and real-world data sets to demonstrate the improvements in runtime and memory consumption while maintaining state-of-the-art estimation and prediction performance.

Related papers

Linearly Convergent Mixup Learning [0.0]
We present two novel algorithms that extend to a broader range of binary classification models. Unlike gradient-based approaches, our algorithms do not require hyper parameters like learning rates, simplifying their implementation and optimization. Our algorithms achieve faster convergence to the optimal solution compared to descent gradient approaches, and that mixup data augmentation consistently improves the predictive performance across various loss functions.
arXiv Detail & Related papers (2025-01-14T02:33:40Z)
FGP: Feature-Gradient-Prune for Efficient Convolutional Layer Pruning [16.91552023598741]
This paper introduces a novel pruning method called Feature-Gradient Pruning (FGP) It integrates both feature-based and gradient-based information to more effectively evaluate the importance of channels across various target classes. Experiments conducted across multiple tasks and datasets show that FGP significantly reduces computational costs and minimizes accuracy loss.
arXiv Detail & Related papers (2024-11-19T08:42:15Z)
Accelerating AI Performance using Anderson Extrapolation on GPUs [2.114333871769023]
We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation. By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence. We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
arXiv Detail & Related papers (2024-10-25T10:45:17Z)
A Functional Extension of Semi-Structured Networks [2.482050942288848]
Semi-structured networks (SSNs) merge structures familiar from additive models with deep neural networks. Inspired by large-scale datasets, this paper explores extending SSNs to functional data. We propose a functional SSN method that retains the advantageous properties of classical functional regression approaches while also improving scalability.
arXiv Detail & Related papers (2024-10-07T18:50:18Z)
Decreasing the Computing Time of Bayesian Optimization using Generalizable Memory Pruning [56.334116591082896]
We show a wrapper of memory pruning and bounded optimization capable of being used with any surrogate model and acquisition function. Running BO on high-dimensional or massive data sets becomes intractable due to this time complexity. All model implementations are run on the MIT Supercloud state-of-the-art computing hardware.
arXiv Detail & Related papers (2023-09-08T14:05:56Z)
Generalizing Backpropagation for Gradient-Based Interpretability [103.2998254573497]
We show that the gradient of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics.
arXiv Detail & Related papers (2023-07-06T15:19:53Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Convergent Boosted Smoothing for Modeling Graph Data with Tabular Node Features [46.052312251801]
We propose a framework for iterating boosting with graph propagation steps. Our approach is anchored in a principled meta loss function. Across a variety of non-iid graph datasets, our method achieves comparable or superior performance.
arXiv Detail & Related papers (2021-10-26T04:53:12Z)
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z)
High-Dimensional Bayesian Optimization via Tree-Structured Additive Models [40.497123136157946]
We consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function. Our goal is to lower the computational resources required and facilitate faster model learning. We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.
arXiv Detail & Related papers (2020-12-24T03:56:44Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.