Related papers: GBHT: Gradient Boosting Histogram Transform for Density Estimation

GBHT: Gradient Boosting Histogram Transform for Density Estimation

URL: http://arxiv.org/abs/2106.05738v1
Date: Thu, 10 Jun 2021 13:40:28 GMT
Title: GBHT: Gradient Boosting Histogram Transform for Density Estimation
Authors: Jingyi Cui, Hanyuan Hang, Yisen Wang, Zhouchen Lin
Abstract summary: We propose a density estimation algorithm called textitGradient Boosting Histogram Transform (GBHT) We make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems.
Score: 73.94900378709023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a density estimation algorithm called \textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the \textit{Negative Log Likelihood} as the loss function to make the boosting procedure available for the unsupervised tasks. From a learning theory viewpoint, we first prove fast convergence rates for GBHT with the smoothness assumption that the underlying density function lies in the space $C^{0,\alpha}$. Then when the target density function lies in spaces $C^{1,\alpha}$, we present an upper bound for GBHT which is smaller than the lower bound of its corresponding base learner, in the sense of convergence rates. To the best of our knowledge, we make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems. In experiments, we not only conduct performance comparisons with the widely used KDE, but also apply GBHT to anomaly detection to showcase a further application of GBHT.

Related papers

Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm. T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent. In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
arXiv Detail & Related papers (2024-10-02T14:09:51Z)
SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning [28.26143547479141]
We introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) SHOT does not increase the computational complexity of the baseline model much. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline.
arXiv Detail & Related papers (2023-10-04T11:43:08Z)
Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z)
Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation [56.44946660061753]
This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC) MSPC enforces a spatial perturbation function (T ) and the translation operator (G) to be commutative (i.e., TG = GT ) Our method outperforms the state-of-the-art methods on most I2I benchmarks.
arXiv Detail & Related papers (2022-03-23T19:59:04Z)
Local Adaptivity of Gradient Boosting in Histogram Transform Ensemble Learning [5.241402683680909]
gradient boosting algorithm called textitadaptive boosting histogram transform (textitABHT) We show that our ABHT can filter out the regions with different orders of smoothness.
arXiv Detail & Related papers (2021-12-05T14:56:56Z)
Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms. We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z)
Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning. In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z)
Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem. We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent. Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning. We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems. We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.