PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree
Ensemble Deployment
- URL: http://arxiv.org/abs/2011.05383v1
- Date: Tue, 10 Nov 2020 20:32:11 GMT
- Title: PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree
Ensemble Deployment
- Authors: Meghana Madhyastha, Kunal Lillaney, James Browne, Joshua Vogelstein,
Randal Burns
- Abstract summary: We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory.
Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from external memory algorithms.
The result is that each I/O yields a higher fraction of useful data, leading to a 2-6 times reduction in classification latency for interactive workloads.
- Score: 4.314299343332365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present methods to serialize and deserialize tree ensembles that optimize
inference latency when models are not already loaded into memory. This arises
whenever models are larger than memory, but also systematically when models are
deployed on low-resource devices, such as in the Internet of Things, or run as
Web micro-services where resources are allocated on demand. Our packed
serialized trees (PACSET) encode reference locality in the layout of a tree
ensemble using principles from external memory algorithms. The layout
interleaves correlated nodes across multiple trees, uses leaf cardinality to
collocate the nodes on the most popular paths and is optimized for the I/O
blocksize. The result is that each I/O yields a higher fraction of useful data,
leading to a 2-6 times reduction in classification latency for interactive
workloads.
Related papers
- ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.
Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.
Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Autoregressive Generation of Static and Growing Trees [49.93294993975928]
We propose a transformer architecture and training strategy for tree generation.
The architecture processes data at multiple resolutions and has an hourglass shape, with middle layers processing fewer tokens than outer layers.
We extend this approach to perform image-to-tree and point-cloud-to-tree conditional generation and to simulate the tree growth processes, generating 4D trees.
arXiv Detail & Related papers (2025-02-07T08:51:14Z) - Decision Trees That Remember: Gradient-Based Learning of Recurrent Decision Trees with Memory [1.4487264853431878]
We introduce ReMeDe Trees, a novel recurrent DT architecture that integrates an internal memory mechanism, similar to RNNs, to learn long-term dependencies in sequential data.
Our model learns hard, axis-aligned decision rules for both output generation and state updates, optimizing them efficiently via gradient descent.
arXiv Detail & Related papers (2025-02-06T13:11:50Z) - TREE: Tree Regularization for Efficient Execution [4.205565040528205]
We present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees.
Specifically, we regularize the impurity of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria.
arXiv Detail & Related papers (2024-06-18T12:01:06Z) - Forecasting with Hyper-Trees [50.72190208487953]
Hyper-Trees are designed to learn the parameters of time series models.
By relating the parameters of a target time series model to features, Hyper-Trees also address the issue of parameter non-stationarity.
In this novel approach, the trees first generate informative representations from the input features, which a shallow network then maps to the target model parameters.
arXiv Detail & Related papers (2024-05-13T15:22:15Z) - ForestPrune: Compact Depth-Controlled Tree Ensembles [7.538482310185135]
We present ForestPrune, a novel framework to post-process tree ensembles by pruning depth layers from individual trees.
We develop a specialized optimization algorithm to efficiently obtain high-quality solutions to problems under ForestPrune.
Our experiments demonstrate that ForestPrune produces parsimonious models that outperform models extracted by existing post-processing algorithms.
arXiv Detail & Related papers (2022-05-31T22:04:18Z) - Point Cloud Compression with Sibling Context and Surface Priors [47.96018990521301]
We present a novel octree-based multi-level framework for large-scale point cloud compression.
In this framework, we propose a new entropy model that explores the hierarchical dependency in an octree.
We locally fit surfaces with a voxel-based geometry-aware module to provide geometric priors in entropy encoding.
arXiv Detail & Related papers (2022-05-02T09:13:26Z) - Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation [141.16965264264195]
Sparsely annotated semantic segmentation (SASS) aims to train a segmentation network with coarse-grained supervisions.
We propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels.
arXiv Detail & Related papers (2022-03-21T05:16:23Z) - Shrub Ensembles for Online Classification [7.057937612386993]
Decision Tree (DT) ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient.
We propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems.
Our algorithm trains small to medium-sized decision trees on small windows and uses gradient descent to learn the ensemble weights of these shrubs'
arXiv Detail & Related papers (2021-12-07T14:22:43Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.