A Comparison of Decision Forest Inference Platforms from A Database
Perspective
- URL: http://arxiv.org/abs/2302.04430v1
- Date: Thu, 9 Feb 2023 04:07:50 GMT
- Title: A Comparison of Decision Forest Inference Platforms from A Database
Perspective
- Authors: Hong Guan, Mahidhar Reddy Dwarampudi, Venkatesh Gunda, Hong Min, Lei
Yu, Jia Zou
- Abstract summary: Decision forest is one of the most popular machine learning techniques used in many industrial scenarios, such as credit card fraud detection, ranking, and business intelligence.
A number of frameworks were developed and dedicated for decision forest inference, such as ONNX, TreeLite from Amazon, Decision Forest from Google, HummingBird from Microsoft, Nvidia FIL, and lleaves.
- Score: 4.873098180823506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision forest, including RandomForest, XGBoost, and LightGBM, is one of the
most popular machine learning techniques used in many industrial scenarios,
such as credit card fraud detection, ranking, and business intelligence.
Because the inference process is usually performance-critical, a number of
frameworks were developed and dedicated for decision forest inference, such as
ONNX, TreeLite from Amazon, TensorFlow Decision Forest from Google, HummingBird
from Microsoft, Nvidia FIL, and lleaves. However, these frameworks are all
decoupled with data management frameworks. It is unclear whether in-database
inference will improve the overall performance. In addition, these frameworks
used different algorithms, optimization techniques, and parallelism models. It
is unclear how these implementations will affect the overall performance and
how to make design decisions for an in-database inference framework.
In this work, we investigated the above questions by comprehensively
comparing the end-to-end performance of the aforementioned inference frameworks
and netsDB, an in-database inference framework we implemented. Through this
study, we identified that netsDB is best suited for handling small-scale models
on large-scale datasets and all-scale models on small-scale datasets, for which
it achieved up to hundreds of times of speedup. In addition, the
relation-centric representation we proposed significantly improved netsDB's
performance in handling large-scale models, while the model reuse optimization
we proposed further improved netsDB's performance in handling small-scale
datasets.
Related papers
- A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Transformer Architecture for NetsDB [0.0]
We create an end-to-end implementation of a transformer for deep learning model serving in NetsDB.
We load out weights from our model for distributed processing, deployment, and efficient inferencing.
arXiv Detail & Related papers (2024-05-08T04:38:36Z) - Implicit Generative Prior for Bayesian Neural Networks [8.013264410621357]
We propose a novel neural adaptive empirical Bayes (NA-EB) framework for complex data structures.
The proposed NA-EB framework combines variational inference with a gradient ascent algorithm.
We demonstrate the practical applications of our framework through extensive evaluations on a variety of tasks.
arXiv Detail & Related papers (2024-04-27T21:00:38Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest [0.0]
In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance.
We experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams.
We conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets.
arXiv Detail & Related papers (2022-10-11T18:05:58Z) - Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank.
Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Mapping the Internet: Modelling Entity Interactions in Complex
Heterogeneous Networks [0.0]
We propose a versatile, unified framework called HMill' for sample representation, model definition and training.
We show an extension of the universal approximation theorem to the set of all functions realized by models implemented in the framework.
We solve three different problems from the cybersecurity domain using the framework.
arXiv Detail & Related papers (2021-04-19T21:32:44Z) - Probabilistic Case-based Reasoning for Open-World Knowledge Graph
Completion [59.549664231655726]
A case-based reasoning (CBR) system solves a new problem by retrieving cases' that are similar to the given problem.
In this paper, we demonstrate that such a system is achievable for reasoning in knowledge-bases (KBs)
Our approach predicts attributes for an entity by gathering reasoning paths from similar entities in the KB.
arXiv Detail & Related papers (2020-10-07T17:48:12Z) - ENTMOOT: A Framework for Optimization over Ensemble Tree Models [57.98561336670884]
ENTMOOT is a framework for integrating tree models into larger optimization problems.
We show how ENTMOOT allows a simple integration of tree models into decision-making and black-box optimization.
arXiv Detail & Related papers (2020-03-10T14:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.