PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree
Ensemble Deployment
- URL: http://arxiv.org/abs/2011.05383v1
- Date: Tue, 10 Nov 2020 20:32:11 GMT
- Title: PACSET (Packed Serialized Trees): Reducing Inference Latency for Tree
Ensemble Deployment
- Authors: Meghana Madhyastha, Kunal Lillaney, James Browne, Joshua Vogelstein,
Randal Burns
- Abstract summary: We present methods to serialize and deserialize tree ensembles that optimize inference latency when models are not already loaded into memory.
Our packed serialized trees (PACSET) encode reference locality in the layout of a tree ensemble using principles from external memory algorithms.
The result is that each I/O yields a higher fraction of useful data, leading to a 2-6 times reduction in classification latency for interactive workloads.
- Score: 4.314299343332365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present methods to serialize and deserialize tree ensembles that optimize
inference latency when models are not already loaded into memory. This arises
whenever models are larger than memory, but also systematically when models are
deployed on low-resource devices, such as in the Internet of Things, or run as
Web micro-services where resources are allocated on demand. Our packed
serialized trees (PACSET) encode reference locality in the layout of a tree
ensemble using principles from external memory algorithms. The layout
interleaves correlated nodes across multiple trees, uses leaf cardinality to
collocate the nodes on the most popular paths and is optimized for the I/O
blocksize. The result is that each I/O yields a higher fraction of useful data,
leading to a 2-6 times reduction in classification latency for interactive
workloads.
Related papers
- TREE: Tree Regularization for Efficient Execution [4.205565040528205]
We present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees.
Specifically, we regularize the impurity of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria.
arXiv Detail & Related papers (2024-06-18T12:01:06Z) - DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference [22.684773338989007]
We introduce an IO-aware tree attention algorithm tailored for tree-structured inference.
DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads.
arXiv Detail & Related papers (2024-03-30T04:34:54Z) - ForestPrune: Compact Depth-Controlled Tree Ensembles [7.538482310185135]
We present ForestPrune, a novel framework to post-process tree ensembles by pruning depth layers from individual trees.
We develop a specialized optimization algorithm to efficiently obtain high-quality solutions to problems under ForestPrune.
Our experiments demonstrate that ForestPrune produces parsimonious models that outperform models extracted by existing post-processing algorithms.
arXiv Detail & Related papers (2022-05-31T22:04:18Z) - Point Cloud Compression with Sibling Context and Surface Priors [47.96018990521301]
We present a novel octree-based multi-level framework for large-scale point cloud compression.
In this framework, we propose a new entropy model that explores the hierarchical dependency in an octree.
We locally fit surfaces with a voxel-based geometry-aware module to provide geometric priors in entropy encoding.
arXiv Detail & Related papers (2022-05-02T09:13:26Z) - Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation [141.16965264264195]
Sparsely annotated semantic segmentation (SASS) aims to train a segmentation network with coarse-grained supervisions.
We propose a novel tree energy loss for SASS by providing semantic guidance for unlabeled pixels.
arXiv Detail & Related papers (2022-03-21T05:16:23Z) - Shrub Ensembles for Online Classification [7.057937612386993]
Decision Tree (DT) ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient.
We propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems.
Our algorithm trains small to medium-sized decision trees on small windows and uses gradient descent to learn the ensemble weights of these shrubs'
arXiv Detail & Related papers (2021-12-07T14:22:43Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression [77.8842824702423]
We present a novel deep compression algorithm to reduce the memory footprint of LiDAR point clouds.
Our method exploits the sparsity and structural redundancy between points to reduce the memory footprint.
Our algorithm can be used to reduce the onboard and offboard storage of LiDAR points for applications such as self-driving cars.
arXiv Detail & Related papers (2020-05-14T17:48:49Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.