Related papers: Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

URL: http://arxiv.org/abs/2110.10075v1
Date: Tue, 19 Oct 2021 16:06:43 GMT
Title: Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement
Authors: Sebastian Buschj\"ager, Katharina Morik
Abstract summary: Random Forests (RF) are among the state-of-the-art in many machine learning applications. We show that the improvement effects of pruning diminish for ensembles of large trees but that pruning has an overall better accuracy-memory trade-off than RF. We present a simple, yet surprisingly effective algorithm that refines the predictions in the leaf nodes in the forest via gradient descent.
Score: 6.967385165474138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random Forests (RF) are among the state-of-the-art in many machine learning applications. With the ongoing integration of ML models into everyday life, the deployment and continuous application of models becomes more and more an important issue. Hence, small models which offer good predictive performance but use small amounts of memory are required. Ensemble pruning is a standard technique to remove unnecessary classifiers from an ensemble to reduce the overall resource consumption and sometimes even improve the performance of the original ensemble. In this paper, we revisit ensemble pruning in the context of `modernly' trained Random Forests where trees are very large. We show that the improvement effects of pruning diminishes for ensembles of large trees but that pruning has an overall better accuracy-memory trade-off than RF. However, pruning does not offer fine-grained control over this trade-off because it removes entire trees from the ensemble. To further improve the accuracy-memory trade-off we present a simple, yet surprisingly effective algorithm that refines the predictions in the leaf nodes in the forest via stochastic gradient descent. We evaluate our method against 7 state-of-the-art pruning methods and show that our method outperforms the other methods on 11 of 16 datasets with a statistically significant better accuracy-memory trade-off compared to most methods. We conclude our experimental evaluation with a case study showing that our method can be applied in a real-world setting.

Related papers

Improving Random Forests by Smoothing [13.20678906714433]
We apply a kernel-based smoothing mechanism to a learned random forest or any piecewise constant prediction function.<n>The resulting model consistently improves the predictive performance of the underlying random forests.
arXiv Detail & Related papers (2025-05-11T05:39:08Z)
Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy. This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree. Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z)
Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles [45.962492329047215]
We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles.
arXiv Detail & Related papers (2024-08-28T23:15:46Z)
ADMM Based Semi-Structured Pattern Pruning Framework For Transformer [4.02487511510606]
This paper introduces Alternating Direction Method of Multipliers(ADMM) based pattern pruning framework to reshape the distribution of activation map. We conduct extensive experiments on classification tasks over GLUE dataset. We achieve 50% percent compression ratio while maintaining overall score 80.1 on GLUE dataset.
arXiv Detail & Related papers (2024-07-11T09:35:08Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin) We propose PUMA, a new data pruning strategy that computes the margin using DeepFool. We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z)
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes [68.86687117368247]
We introduce Bonsai, a gradient-free structured pruning method that eliminates the need for backpropagation. Bonsai achieves better compression with fewer resources, but also produces models that are twice as fast as those generated by semi-structured pruning. Our results show that removing backprop as a requirement can also lead to state-of-the-art efficiency and performance.
arXiv Detail & Related papers (2024-02-08T04:48:26Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
Residual Likelihood Forests [19.97069303172077]
This paper presents a novel ensemble learning approach called Residual Likelihood Forests (RLF) Our weak learners produce conditional likelihoods that are sequentially optimized using global loss in the context of previous learners. When compared against several ensemble approaches including Random Forests and Gradient Boosted Trees, RLFs offer a significant improvement in performance.
arXiv Detail & Related papers (2020-11-04T00:59:41Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Movement Pruning: Adaptive Sparsity by Fine-Tuning [115.91907953454034]
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes.
arXiv Detail & Related papers (2020-05-15T17:54:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.