Related papers: InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference

InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference

URL: http://arxiv.org/abs/2505.15391v1
Date: Wed, 21 May 2025 11:28:43 GMT
Title: InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference
Authors: Duncan Bart, Bruno Endres Forlin, Ana-Lucia Varbanescu, Marco Ottavi, Kuan-Hsun Chen,
Abstract summary: InTreeger is an end-to-end framework that takes a training dataset as input, and outputs an architecture-agnostic integer-only C implementation of tree-based machine learning model.<n>This framework enables anyone, even those without prior experience in machine learning, to generate a highly optimized integer-only classification model.
Score: 1.2495506469683937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integer quantization has emerged as a critical technique to facilitate deployment on resource-constrained devices. Although they do reduce the complexity of the learning models, their inference performance is often prone to quantization-induced errors. To this end, we introduce InTreeger: an end-to-end framework that takes a training dataset as input, and outputs an architecture-agnostic integer-only C implementation of tree-based machine learning model, without loss of precision. This framework enables anyone, even those without prior experience in machine learning, to generate a highly optimized integer-only classification model that can run on any hardware simply by providing an input dataset and target variable. We evaluated our generated implementations across three different architectures (ARM, x86, and RISC-V), resulting in significant improvements in inference latency. In addition, we show the energy efficiency compared to typical decision tree implementations that rely on floating-point arithmetic. The results underscore the advantages of integer-only inference, making it particularly suitable for energy- and area-constrained devices such as embedded systems and edge computing platforms, while also enabling the execution of decision trees on existing ultra-low power devices.

Related papers

TreeLUT: An Efficient Alternative to Deep Neural Networks for Inference Acceleration Using Gradient Boosted Decision Trees [0.6906005491572401]
We present TreeLUT, an open-source tool for implementing gradient boosted decision trees (GBDTs) on FPGAs.<n>We show the effectiveness of TreeLUT using multiple datasets classification, commonly used to evaluate ultra-low area and latency.<n>Our results show that TreeLUT significantly improves hardware utilization, latency, and throughput at competitive accuracy compared to previous works.
arXiv Detail & Related papers (2025-01-02T19:38:07Z)
Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation [3.737361598712633]
We present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step. The results show that the performance of decision tree ensemble inference can be significantly improved.
arXiv Detail & Related papers (2024-04-10T09:17:22Z)
Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models [1.5807079236265718]
KEN is a straightforward, universal and unstructured pruning algorithm based on Kernel Density Estimation (KDE) Ken aims to construct optimized transformers by selectively preserving the most significant parameters while restoring others to their pre-training state. Ken achieves equal or better performance than their original unpruned versions, with a minimum parameter reduction of 25%.
arXiv Detail & Related papers (2024-02-05T16:11:43Z)
Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures [6.556553154231475]
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices.
arXiv Detail & Related papers (2020-04-19T16:42:51Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
Polynomial-Time Exact MAP Inference on Discrete Models with Global Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees. We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.