Optimization of Decision Tree Evaluation Using SIMD Instructions
- URL: http://arxiv.org/abs/2205.07307v1
- Date: Sun, 15 May 2022 15:12:40 GMT
- Title: Optimization of Decision Tree Evaluation Using SIMD Instructions
- Authors: Alexey Mironov, Ilnur Khuziev
- Abstract summary: We explore MatrixNet, the ancestor of the popular CatBoost library.
This paper investigates the opportunities given by the AVX instruction set to evaluate models more efficiently.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision forest (decision tree ensemble) is one of the most popular machine
learning algorithms. To use large models on big data, like document scoring
with learning-to-rank models, we need to evaluate these models efficiently. In
this paper, we explore MatrixNet, the ancestor of the popular CatBoost library.
Both libraries use the SSE instruction set for scoring on CPU. This paper
investigates the opportunities given by the AVX instruction set to evaluate
models more efficiently. We achieved 35% speedup on the binarization stage
(nodes conditions comparison), and 20% speedup on the trees apply stage on the
ranking model.
Related papers
- Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain.
We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance.
We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z) - Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks.
We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs.
We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z) - Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation [3.737361598712633]
We present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step.
The results show that the performance of decision tree ensemble inference can be significantly improved.
arXiv Detail & Related papers (2024-04-10T09:17:22Z) - Towards Constituting Mathematical Structures for Learning to Optimize [101.80359461134087]
A technique that utilizes machine learning to learn an optimization algorithm automatically from data has gained arising attention in recent years.
A generic L2O approach parameterizes the iterative update rule and learns the update direction as a black-box network.
While the generic approach is widely applicable, the learned model can overfit and may not generalize well to out-of-distribution test sets.
We propose a novel L2O model with a mathematics-inspired structure that is broadly applicable and generalized well to out-of-distribution problems.
arXiv Detail & Related papers (2023-05-29T19:37:28Z) - Fast Inference of Tree Ensembles on ARM Devices [6.995377781193234]
We convert the popular QuickScorer algorithm and its siblings from Intel's AVX to ARM's NEON instruction set.
Third, we investigate the effects of using fixed-point quantization in Random Forests.
arXiv Detail & Related papers (2023-05-15T12:05:03Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - Towards a learning-based performance modeling for accelerating Deep
Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs)
Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z) - Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA
Models [5.131758478675364]
Low-order functional ANOVA models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning.
We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance.
We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.
arXiv Detail & Related papers (2022-07-14T14:23:14Z) - Cortex: A Compiler for Recursive Deep Learning Models [12.307249556836375]
We present Cortex, a compiler-based approach to generate highly-efficient code for deep learning models.
Our compiler approach and low reliance on vendor libraries enables us to perform end-to-end optimizations, leading to up to 14X lower inference latencies over past work.
arXiv Detail & Related papers (2020-11-02T23:35:14Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Heuristic Semi-Supervised Learning for Graph Generation Inspired by
Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph.
In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.