GBHT: Gradient Boosting Histogram Transform for Density Estimation
- URL: http://arxiv.org/abs/2106.05738v1
- Date: Thu, 10 Jun 2021 13:40:28 GMT
- Title: GBHT: Gradient Boosting Histogram Transform for Density Estimation
- Authors: Jingyi Cui, Hanyuan Hang, Yisen Wang, Zhouchen Lin
- Abstract summary: We propose a density estimation algorithm called textitGradient Boosting Histogram Transform (GBHT)
We make the first attempt to theoretically explain why boosting can enhance the performance of its base learners for density estimation problems.
- Score: 73.94900378709023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a density estimation algorithm called
\textit{Gradient Boosting Histogram Transform} (GBHT), where we adopt the
\textit{Negative Log Likelihood} as the loss function to make the boosting
procedure available for the unsupervised tasks. From a learning theory
viewpoint, we first prove fast convergence rates for GBHT with the smoothness
assumption that the underlying density function lies in the space
$C^{0,\alpha}$. Then when the target density function lies in spaces
$C^{1,\alpha}$, we present an upper bound for GBHT which is smaller than the
lower bound of its corresponding base learner, in the sense of convergence
rates. To the best of our knowledge, we make the first attempt to theoretically
explain why boosting can enhance the performance of its base learners for
density estimation problems. In experiments, we not only conduct performance
comparisons with the widely used KDE, but also apply GBHT to anomaly detection
to showcase a further application of GBHT.
Related papers
- Truncated Kernel Stochastic Gradient Descent on Spheres [1.4583059436979549]
Inspired by the structure of spherical harmonics, we propose the truncated kernel gradient descent (T- Kernel SGD) algorithm.
T- Kernel SGD employs a "truncation" operation, enabling the application of series-based kernels function in gradient descent.
In contrast to traditional kernel SGD, T- Kernel SGD is more effective in balancing bias and variance.
arXiv Detail & Related papers (2024-10-02T14:09:51Z) - SHOT: Suppressing the Hessian along the Optimization Trajectory for
Gradient-Based Meta-Learning [28.26143547479141]
We introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory)
SHOT does not increase the computational complexity of the baseline model much.
We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline.
arXiv Detail & Related papers (2023-10-04T11:43:08Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image
Translation [56.44946660061753]
This paper proposes a universal regularization technique called maximum spatial perturbation consistency (MSPC)
MSPC enforces a spatial perturbation function (T ) and the translation operator (G) to be commutative (i.e., TG = GT )
Our method outperforms the state-of-the-art methods on most I2I benchmarks.
arXiv Detail & Related papers (2022-03-23T19:59:04Z) - Local Adaptivity of Gradient Boosting in Histogram Transform Ensemble
Learning [5.241402683680909]
gradient boosting algorithm called textitadaptive boosting histogram transform (textitABHT)
We show that our ABHT can filter out the regions with different orders of smoothness.
arXiv Detail & Related papers (2021-12-05T14:56:56Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Gradient Boosted Binary Histogram Ensemble for Large-scale Regression [60.16351608335641]
We propose a gradient boosting algorithm for large-scale regression problems called textitGradient Boosted Binary Histogram Ensemble (GBBHE) based on binary histogram partition and ensemble learning.
In the experiments, compared with other state-of-the-art algorithms such as gradient boosted regression tree (GBRT), our GBBHE algorithm shows promising performance with less running time on large-scale datasets.
arXiv Detail & Related papers (2021-06-03T17:05:40Z) - Detached Error Feedback for Distributed SGD with Random Sparsification [98.98236187442258]
Communication bottleneck has been a critical problem in large-scale deep learning.
We propose a new distributed error feedback (DEF) algorithm, which shows better convergence than error feedback for non-efficient distributed problems.
We also propose DEFA to accelerate the generalization of DEF, which shows better bounds than DEF.
arXiv Detail & Related papers (2020-04-11T03:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.