Related papers: DiNo and RanBu: Lightweight Predictions from Shallow Random Forests

DiNo and RanBu: Lightweight Predictions from Shallow Random Forests

URL: http://arxiv.org/abs/2510.23624v1
Date: Thu, 23 Oct 2025 20:12:08 GMT
Title: DiNo and RanBu: Lightweight Predictions from Shallow Random Forests
Authors: Tiago Mendonça dos Santos, Rafael Izbicki, Luís Gustavo Esteves,
Abstract summary: DiNo and RanBu convert a small set of depth-limited trees into efficient, distance-weighted predictors.<n>RanBu matches or exceeds the accuracy of full-depth random forests in high-noise settings.<n>Both methods extend directly to quantile regression, maintaining accuracy with substantial speed gains.
Score: 2.2080796858692575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Random Forest ensembles are a strong baseline for tabular prediction tasks, but their reliance on hundreds of deep trees often results in high inference latency and memory demands, limiting deployment in latency-sensitive or resource-constrained environments. We introduce DiNo (Distance with Nodes) and RanBu (Random Bushes), two shallow-forest methods that convert a small set of depth-limited trees into efficient, distance-weighted predictors. DiNo measures cophenetic distances via the most recent common ancestor of observation pairs, while RanBu applies kernel smoothing to Breiman's classical proximity measure. Both approaches operate entirely after forest training: no additional trees are grown, and tuning of the single bandwidth parameter $h$ requires only lightweight matrix-vector operations. Across three synthetic benchmarks and 25 public datasets, RanBu matches or exceeds the accuracy of full-depth random forests-particularly in high-noise settings-while reducing training plus inference time by up to 95\%. DiNo achieves the best bias-variance trade-off in low-noise regimes at a modest computational cost. Both methods extend directly to quantile regression, maintaining accuracy with substantial speed gains. The implementation is available as an open-source R/C++ package at https://github.com/tiagomendonca/dirf. We focus on structured tabular random samples (i.i.d.), leaving extensions to other modalities for future work.

Related papers

Jump Like A Squirrel: Optimized Execution Step Order for Anytime Random Forest Inference [2.610730315907993]
We realize decision trees and random forests as anytime algorithms on the granularity of single steps in trees.<n>We propose the Optimal Order, which finds a step order with a maximal mean accuracy in exponential runtime.<n>Our evaluation shows, that the Backward Squirrel Order performs $sim94%$ as well as the Optimal Order and $sim99%$ as well as all other step orders.
arXiv Detail & Related papers (2026-03-02T08:15:37Z)
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling [65.46347858249295]
TreePO is a self-guided rollout algorithm that views sequence generation as a tree-structured searching process.<n>TreePO essentially reduces the per-update compute burden while preserving or enhancing exploration diversity.
arXiv Detail & Related papers (2025-08-24T16:52:37Z)
TreeRPO: Tree Relative Policy Optimization [65.51935468270916]
name is a novel method that estimates the mathematical expectations of rewards at various reasoning steps using tree sampling.<n>Building on the group-relative reward training mechanism of GRPO, name innovatively computes rewards based on step-level groups generated during tree sampling.
arXiv Detail & Related papers (2025-06-05T15:56:38Z)
Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.<n> diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.<n>We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Inference with Mondrian Random Forests [6.97762648094816]
We give precise bias and variance characterizations, along with a Berry-Esseen-type central limit theorem, for the Mondrian random forest regression estimator.<n>We present valid statistical inference methods for the unknown regression function.<n> Efficient and implementable algorithms are devised for both batch and online learning settings.
arXiv Detail & Related papers (2023-10-15T01:41:42Z)
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges. We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z)
Minimax Rates for High-Dimensional Random Tessellation Forests [0.0]
Mondrian forests is the first class of random forests for which minimax rates were obtained in arbitrary dimension. We show that a large class of random forests with general split directions also achieve minimax optimal convergence rates in arbitrary dimension.
arXiv Detail & Related papers (2021-09-22T06:47:38Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs) We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach. Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z)
Random boosting and random^2 forests -- A random tree depth injection approach [0.1749935196721634]
We propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called emphRandom Boost and emphRandom$2$ Forest.
arXiv Detail & Related papers (2020-09-13T20:14:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.