Data-driven multinomial random forest
- URL: http://arxiv.org/abs/2304.04240v1
- Date: Sun, 9 Apr 2023 14:04:56 GMT
- Title: Data-driven multinomial random forest
- Authors: Junhao Chen, Xueli wang
- Abstract summary: We propose a data-driven multinomial random forest (DMRF) algorithm, which has lower complexity than MRF and higher complexity than BRF.
To the best of our knowledge, DMRF is currently the most excellent strongly consistent RF variant with low algorithm complexity.
- Score: 2.1828601975620257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article, we strengthen the proof methods of some previously weakly
consistent variants of random forests into strongly consistent proof methods,
and improve the data utilization of these variants, in order to obtain better
theoretical properties and experimental performance. In addition, based on the
multinomial random forest (MRF) and Bernoulli random forest (BRF), we propose a
data-driven multinomial random forest (DMRF) algorithm, which has lower
complexity than MRF and higher complexity than BRF while satisfying strong
consistency. It has better performance in classification and regression
problems than previous RF variants that only satisfy weak consistency, and in
most cases even surpasses standard random forest. To the best of our knowledge,
DMRF is currently the most excellent strongly consistent RF variant with low
algorithm complexity
Related papers
- BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time.
Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution.
We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z) - A New Random Forest Ensemble of Intuitionistic Fuzzy Decision Trees [5.831659043074847]
We propose a new random forest ensemble of intuitionistic fuzzy decision trees (IFDT)
The proposed method enjoys the power of the randomness from bootstrapped sampling and feature selection.
This study is the first to propose a random forest ensemble based on the intuitionistic fuzzy theory.
arXiv Detail & Related papers (2024-03-12T06:52:24Z) - Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method.
Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Optimal Weighted Random Forests [8.962539518822684]
The random forest (RF) algorithm has become a very popular prediction method for its great flexibility and promising accuracy.
We propose two optimal algorithms, namely the 1 Step Weighted RF (1step-WRF$_mathrmopt$) and 2 Steps Optimal Weighted RF (2steps-WRF$_mathrmopt$)
Numerical studies conducted on real-world data sets indicate that these algorithms outperform the equal-weight forest and two other weighted RFs proposed in existing literature.
arXiv Detail & Related papers (2023-05-17T08:36:43Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Data-driven multinomial random forest: A new random forest variant with
strong consistency [1.2147145617662436]
We modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods.
We propose a data-driven multinomial random forest (DMRF) which has the same complexity with BreimanRF while satisfying strong consistency with probability 1.
arXiv Detail & Related papers (2022-11-28T09:08:23Z) - Sequential Permutation Testing of Random Forest Variable Importance
Measures [68.8204255655161]
It is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests.
The results of simulation studies confirm that the theoretical properties of the sequential tests apply.
The numerical stability of the methods is investigated in two additional application studies.
arXiv Detail & Related papers (2022-06-02T20:16:50Z) - Mixed Variable Bayesian Optimization with Frequency Modulated Kernels [96.78099706164747]
We propose the frequency modulated (FM) kernel flexibly modeling dependencies among different types of variables.
BO-FM outperforms competitors including Regularized evolution(RE) and BOHB.
arXiv Detail & Related papers (2021-02-25T11:28:46Z) - Crossbreeding in Random Forest [5.8010446129208155]
Ensemble learning methods are designed to benefit from multiple learning algorithms for better predictive performance.
The tradeoff of this improved performance is slower speed and larger size of ensemble learning systems compared to single learning systems.
We present a novel approach to deal with this problem in Random Forest (RF) as one of the most powerful ensemble methods.
arXiv Detail & Related papers (2021-01-21T12:58:54Z) - Lower bounds in multiple testing: A framework based on derandomized
proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models.
We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.