Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation
- URL: http://arxiv.org/abs/2404.06846v1
- Date: Wed, 10 Apr 2024 09:17:22 GMT
- Title: Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation
- Authors: Daniel Biebert, Christian Hakert, Kuan-Hsun Chen, Jian-Jia Chen,
- Abstract summary: We present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step.
The results show that the performance of decision tree ensemble inference can be significantly improved.
- Score: 3.737361598712633
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This leads to not optimally used CPU registers. This is a shortcoming, especially in resource constrained embedded setups. In this work, we present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step directly from the high-level model representation. Specifically, we develop various approaches to effectively allocate registers for the inference of decision tree ensembles. Extensive evaluations of the proposed method are conducted in comparison to the basic realization of C code from the high-level machine learning model and succeeding compilation. The results show that the performance of decision tree ensemble inference can be significantly improved (by up to $\approx1.6\times$), if the methods are applied carefully to the appropriate scenario.
Related papers
- TREE: Tree Regularization for Efficient Execution [4.205565040528205]
We present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees.
Specifically, we regularize the impurity of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria.
arXiv Detail & Related papers (2024-06-18T12:01:06Z) - Should AI Optimize Your Code? A Comparative Study of Current Large Language Models Versus Classical Optimizing Compilers [0.0]
Large Language Models (LLMs) raise intriguing questions about the potential for AI-driven approaches to revolutionize code optimization methodologies.
This paper presents a comparative analysis between two state-of-the-art Large Language Models, GPT-4.0 and CodeLlama-70B, and traditional optimizing compilers.
arXiv Detail & Related papers (2024-06-17T23:26:41Z) - Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures [67.47328776279204]
This work introduces a framework to develop efficient, portable Deep Learning and High Performance Computing kernels.
We decompose the kernel development in two steps: 1) Expressing the computational core using Processing Primitives (TPPs) and 2) Expressing the logical loops around TPPs in a high-level, declarative fashion.
We demonstrate the efficacy of our approach using standalone kernels and end-to-end workloads that outperform state-of-the-art implementations on diverse CPU platforms.
arXiv Detail & Related papers (2023-04-25T05:04:44Z) - HDCC: A Hyperdimensional Computing compiler for classification on
embedded systems and high-performance computing [58.720142291102135]
This work introduces the name compiler, the first open-source compiler that translates high-level descriptions of HDC classification methods into optimized C code.
name is designed like a modern compiler, featuring an intuitive and descriptive input language, an intermediate representation (IR), and a retargetable backend.
To substantiate these claims, we conducted experiments with HDCC on several of the most popular datasets in the HDC literature.
arXiv Detail & Related papers (2023-04-24T19:16:03Z) - bsnsing: A decision tree induction method based on recursive optimal
boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process.
It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z) - Optimization of Decision Tree Evaluation Using SIMD Instructions [0.0]
We explore MatrixNet, the ancestor of the popular CatBoost library.
This paper investigates the opportunities given by the AVX instruction set to evaluate models more efficiently.
arXiv Detail & Related papers (2022-05-15T15:12:40Z) - Improved Learning Bounds for Branch-and-Cut [98.92725321081994]
Branch-and-cut is the most widely used algorithm for solving integer programs.
An increasingly popular approach is to use machine learning to tune these parameters.
In this paper, we prove sample guarantees for this procedure.
arXiv Detail & Related papers (2021-11-18T04:07:29Z) - Learning to Superoptimize Real-world Programs [79.4140991035247]
We propose a framework to learn to superoptimize real-world programs by using neural sequence-to-sequence models.
We introduce the Big Assembly benchmark, a dataset consisting of over 25K real-world functions mined from open-source projects in x86-64 assembly.
arXiv Detail & Related papers (2021-09-28T05:33:21Z) - Simple is better: Making Decision Trees faster using random sampling [4.284674689172996]
In recent years, gradient boosted decision trees have become popular in building robust machine learning models on big data.
We show theoretically and empirically that choosing the split points uniformly at random provides the same or even better performance in terms of accuracy and computational efficiency.
arXiv Detail & Related papers (2021-08-19T17:00:21Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.