Related papers: Data-driven multinomial random forest: A new random forest variant with strong consistency

Data-driven multinomial random forest: A new random forest variant with strong consistency

URL: http://arxiv.org/abs/2211.15154v2
Date: Mon, 16 Oct 2023 13:49:21 GMT
Title: Data-driven multinomial random forest: A new random forest variant with strong consistency
Authors: JunHao Chen
Abstract summary: We modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods. We propose a data-driven multinomial random forest (DMRF) which has the same complexity with BreimanRF while satisfying strong consistency with probability 1.
Score: 1.2147145617662436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we modify the proof methods of some previously weakly consistent variants of random forests into strongly consistent proof methods, and improve the data utilization of these variants in order to obtain better theoretical properties and experimental performance. In addition, we propose a data-driven multinomial random forest (DMRF), which has the same complexity with BreimanRF (proposed by Breiman) while satisfying strong consistency with probability 1. It has better performance in classification and regression problems than previous RF variants that only satisfy weak consistency, and in most cases even surpasses BreimanRF in classification tasks. To the best of our knowledge, DMRF is currently a low-complexity and high-performing variation of random forests that achieves strong consistency with probability 1.

Related papers

Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols [123.73663884421272]
Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.<n>We establish FEWTRANS, a comprehensive benchmark containing 10 diverse datasets.<n>By releasing FEWTRANS, we aim to provide a rigorous "ruler" to streamline reproducible advances in few-shot transfer learning research.
arXiv Detail & Related papers (2026-02-28T05:41:57Z)
Consistency of Honest Decision Trees and Random Forests [0.0]
We study various types of consistency of honest decision trees and random forests in the regression setting.<n>We establish weak and almost sure convergence of honest trees and honest forest averages to the true regression function.
arXiv Detail & Related papers (2026-01-21T13:40:36Z)
Asymptotic confidence bands for centered purely random forests [0.7136933021609079]
We propose a new type of purely random forests, called the Ehrenfest centered purely random forests, which achieve minimax optimal rates.<n>Our main confidence band theorem applies to both random forests.
arXiv Detail & Related papers (2025-11-17T10:09:01Z)
From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching [11.158961763380278]
Recent alternatives improve robustness by leveraging test-time data, but such data may be unavailable in practice.<n>We take a data-centric approach by leveraging invariant data pairs and noisy counterfactual matching.<n>We validate on a synthetic dataset and demonstrate on real-world benchmarks that linear probing on a pretrained backbone improves robustness.
arXiv Detail & Related papers (2025-05-30T17:42:32Z)
Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Exogenous Randomness Empowering Random Forests [4.396860522241306]
We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests. Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees. Our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests.
arXiv Detail & Related papers (2024-11-12T05:06:10Z)
Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z)
Variational Classification [51.2541371924591]
We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency. We induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer.
arXiv Detail & Related papers (2023-05-17T17:47:19Z)
Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems. We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings. We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z)
Data-driven multinomial random forest [2.1828601975620257]
We propose a data-driven multinomial random forest (DMRF) algorithm, which has lower complexity than MRF and higher complexity than BRF. To the best of our knowledge, DMRF is currently the most excellent strongly consistent RF variant with low algorithm complexity.
arXiv Detail & Related papers (2023-04-09T14:04:56Z)
SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness [61.212486108346695]
We propose a training scheme, coined SmoothMix, to control the robustness of smoothed classifiers via self-mixup. The proposed procedure effectively identifies over-confident, near off-class samples as a cause of limited robustness. Our experimental results demonstrate that the proposed method can significantly improve the certified $ell$-robustness of smoothed classifiers.
arXiv Detail & Related papers (2021-11-17T18:20:59Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Learning generative models for valid knockoffs using novel multivariate-rank based statistics [12.528602250193206]
Rank energy (RE) is derived using theoretical results characterizing the optimal maps in the Monge's Optimal Transport (OT) problem. We propose a variant of the RE, dubbed as soft rank energy (sRE), and its kernel variant called as soft rank maximum mean discrepancy (sRMMD) We then use sRMMD to generate deep knockoffs and show via extensive evaluation that it is a novel and effective method to produce valid knockoffs.
arXiv Detail & Related papers (2021-10-29T18:51:19Z)
Towards Robust Classification with Deep Generative Forests [13.096855747795303]
Decision Trees and Random Forests are among the most widely used machine learning models. Being primarily discriminative models they lack principled methods to manipulate the uncertainty of predictions. We exploit Generative Forests (GeFs) to extend Random Forests to generative models representing the full joint distribution over the feature space.
arXiv Detail & Related papers (2020-07-11T08:57:52Z)
Lower bounds in multiple testing: A framework based on derandomized proxies [107.69746750639584]
This paper introduces an analysis strategy based on derandomization, illustrated by applications to various concrete models. We provide numerical simulations of some of these lower bounds, and show a close relation to the actual performance of the Benjamini-Hochberg (BH) algorithm.
arXiv Detail & Related papers (2020-05-07T19:59:51Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.