Random Forest (RF) Kernel for Regression, Classification and Survival
- URL: http://arxiv.org/abs/2009.00089v1
- Date: Mon, 31 Aug 2020 20:21:27 GMT
- Title: Random Forest (RF) Kernel for Regression, Classification and Survival
- Authors: Dai Feng and Richard Baumgartner
- Abstract summary: We elucidate the performance and properties of the data driven RF kernels used by regularized linear models.
We show that for continuous and survival targets, the RF kernels are competitive to RF in higher dimensional scenarios.
We also provide the results from real life data sets for the regression, classification and survival to illustrate how these insights may be leveraged in practice.
- Score: 1.8275108630751844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Breiman's random forest (RF) can be interpreted as an implicit kernel
generator,where the ensuing proximity matrix represents the data-driven RF
kernel. Kernel perspective on the RF has been used to develop a principled
framework for theoretical investigation of its statistical properties. However,
practical utility of the links between kernels and the RF has not been widely
explored and systematically evaluated.Focus of our work is investigation of the
interplay between kernel methods and the RF. We elucidate the performance and
properties of the data driven RF kernels used by regularized linear models in a
comprehensive simulation study comprising of continuous, binary and survival
targets. We show that for continuous and survival targets, the RF kernels are
competitive to RF in higher dimensional scenarios with larger number of noisy
features. For the binary target, the RF kernel and RF exhibit comparable
performance. As the RF kernel asymptotically converges to the Laplace kernel,
we included it in our evaluation. For most simulation setups, the RF and
RFkernel outperformed the Laplace kernel. Nevertheless, in some cases the
Laplace kernel was competitive, showing its potential value for applications.
We also provide the results from real life data sets for the regression,
classification and survival to illustrate how these insights may be leveraged
in practice.Finally, we discuss further extensions of the RF kernels in the
context of interpretable prototype and landmarking classification, regression
and survival. We outline future line of research for kernels furnished by
Bayesian counterparts of the RF.
Related papers
- Optimal Kernel Quantile Learning with Random Features [0.9208007322096533]
This paper presents a generalization study of kernel quantile regression with random features (KQR-RF)
Our study establishes the capacity-dependent learning rates for KQR-RF under mild conditions on the number of RFs.
By slightly modifying our assumptions, the capacity-dependent error analysis can also be applied to cases with Lipschitz continuous losses.
arXiv Detail & Related papers (2024-08-24T14:26:09Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models.
This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models.
We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z) - Early Detection of COVID-19 Hotspots Using Spatio-Temporal Data [66.70036251870988]
The Centers for Disease Control and Prevention (CDC) has worked with other federal agencies to identify counties with increasing coronavirus 2019 (CO-19) incidence (hotspots)
This paper presents a sparse model for early detection of COVID-19 hotspots (at the county level) in the United States.
Deep neural networks are introduced to enhance the model's representative power while still enjoying the interpretability of the kernel.
arXiv Detail & Related papers (2021-05-31T19:28:17Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - (Decision and regression) tree ensemble based kernels for regression and
classification [2.28438857884398]
Tree based ensembles such as Breiman's random forest (RF) and Gradient Boosted Trees (GBT) can be interpreted as implicit kernel generators.
We show that for continuous targets, the RF/GBT kernels are competitive to their respective ensembles in higher dimensional scenarios.
We provide the results from real life data sets for regression and classification to show how these insights may be leveraged in practice.
arXiv Detail & Related papers (2020-12-19T16:52:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.