Data-driven multinomial random forest: A new random forest variant with
strong consistency
- URL: http://arxiv.org/abs/2211.15154v2
- Date: Mon, 16 Oct 2023 13:49:21 GMT
- Title: Data-driven multinomial random forest: A new random forest variant with
strong consistency
- Authors: JunHao Chen
- Abstract summary: We modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods.
We propose a data-driven multinomial random forest (DMRF) which has the same complexity with BreimanRF while satisfying strong consistency with probability 1.
- Score: 1.2147145617662436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we modify the proof methods of some previously weakly
consistent variants of random forests into strongly consistent proof methods,
and improve the data utilization of these variants in order to obtain better
theoretical properties and experimental performance. In addition, we propose a
data-driven multinomial random forest (DMRF), which has the same complexity
with BreimanRF (proposed by Breiman) while satisfying strong consistency with
probability 1. It has better performance in classification and regression
problems than previous RF variants that only satisfy weak consistency, and in
most cases even surpasses BreimanRF in classification tasks. To the best of our
knowledge, DMRF is currently a low-complexity and high-performing variation of
random forests that achieves strong consistency with probability 1.
Related papers
- Exogenous Randomness Empowering Random Forests [4.396860522241306]
We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests.
Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees.
Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
arXiv Detail & Related papers (2024-11-12T05:06:10Z) - Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method.
Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders.
Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency.
We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Data-driven multinomial random forest [2.1828601975620257]
We propose a data-driven multinomial random forest (DMRF) algorithm, which has lower complexity than MRF and higher complexity than BRF.
To the best of our knowledge, DMRF is currently the most excellent strongly consistent RF variant with low algorithm complexity.
arXiv Detail & Related papers (2023-04-09T14:04:56Z) - SmoothMix: Training Confidence-calibrated Smoothed Classifiers for
Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup.
The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness.
Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Learning generative models for valid knockoffs using novel
multivariate-rank based statistics [12.528602250193206]
Rank energy (RE) is derived using theoretical results characterizing the optimal maps in the Monge's Optimal Transport (OT) problem.
We propose a variant of the RE, dubbed as soft rank energy (sRE), and its kernel variant called as soft rank maximum mean discrepancy (sRMMD)
We then use sRMMD to generate deep knockoffs and show via extensive evaluation that it is a novel and effective method to produce valid knockoffs.
arXiv Detail & Related papers (2021-10-29T18:51:19Z) - Towards Robust Classification with Deep Generative Forests [13.096855747795303]
Decision Trees and Random Forests are among the most widely used machine learning models.
Being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions.
We exploit Generative Forests (GeFs) to extend Random Forests to generative models representing the full joint distribution over the feature space.
arXiv Detail & Related papers (2020-07-11T08:57:52Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.