Deploying Machine Learning Models to Ahead-of-Time Runtime on Edge Using
MicroTVM
- URL: http://arxiv.org/abs/2304.04842v2
- Date: Fri, 14 Apr 2023 14:05:19 GMT
- Title: Deploying Machine Learning Models to Ahead-of-Time Runtime on Edge Using
MicroTVM
- Authors: Chen Liu, Matthias Jobst, Liyuan Guo, Xinyue Shi, Johannes Partzsch,
Christian Mayr
- Abstract summary: We develop an end-to-end code generator parsing a pre-trained model to C source libraries for the backend.
Specific compute-intensive operators can be easily offloaded to the dedicated accelerator.
We conduct a hand gesture recognition experiment on an ARM Cortex M4F core.
- Score: 2.144835105990896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past few years, more and more AI applications have been applied to
edge devices. However, models trained by data scientists with machine learning
frameworks, such as PyTorch or TensorFlow, can not be seamlessly executed on
edge. In this paper, we develop an end-to-end code generator parsing a
pre-trained model to C source libraries for the backend using MicroTVM, a
machine learning compiler framework extension addressing inference on bare
metal devices. An analysis shows that specific compute-intensive operators can
be easily offloaded to the dedicated accelerator with a Universal Modular
Accelerator (UMA) interface, while others are processed in the CPU cores. By
using the automatically generated ahead-of-time C runtime, we conduct a hand
gesture recognition experiment on an ARM Cortex M4F core.
Related papers
- Machine Learning for Arbitrary Single-Qubit Rotations on an Embedded Device [1.3753825907341728]
We present a technique for using machine learning (ML) for single-qubit gate synthesis on field programmable logic.
We first bootstrap a model based on simulation with access to the full statevector for measuring gate fidelity.
We next present an algorithm, named adapted randomized benchmarking (ARB), for fine-tuning the gate on hardware based on measurements.
arXiv Detail & Related papers (2024-11-20T04:59:38Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - ML-driven Hardware Cost Model for MLIR [1.2987894327817158]
We develop a machine learning-based cost model for high-level MLIR.
By considering the incoming MLIR as a text input a la NLP models we can apply well-known techniques from modern NLP research.
We show that these models can provide reasonably good estimates with low error bounds for various hardware characteristics of interest.
arXiv Detail & Related papers (2023-02-14T11:32:47Z) - End-to-end AI framework for interpretable prediction of molecular and
crystal properties [3.8878792624088856]
The framework is based on state-of-the-art AI models including CGCNN, PhysNet, SchNet, MPNN, MPNN-transformer, and TorchMD-NET.
We employ these AI models along with the benchmark QM9, hMOF, and MD17 datasets to showcase how the models can predict user-specified material properties.
arXiv Detail & Related papers (2022-12-21T19:27:51Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - CrypTen: Secure Multi-Party Computation Meets Machine Learning [25.21435023269728]
CrypTen is a software framework that exposes popular secure MPC primitives via abstractions common in modern machine-learning frameworks.
This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification.
arXiv Detail & Related papers (2021-09-02T14:36:55Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - A Tensor Compiler for Unified Machine Learning Prediction Serving [8.362773007171118]
Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure.
Model scoring is a primary contributor to infrastructure complexity and cost as models are trained once but used many times.
We propose HUMMINGBIRD, a novel approach to model scoring that compiles featurization operators and traditional ML models into a small set of tensor operations.
arXiv Detail & Related papers (2020-10-09T21:02:47Z) - Neural Network Compression Framework for fast model inference [59.65531492759006]
We present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF)
It leverages recent advances of various network compression methods and implements some of them, such as sparsity, quantization, and binarization.
The framework can be used within the training samples, which are supplied with it, or as a standalone package that can be seamlessly integrated into the existing training code.
arXiv Detail & Related papers (2020-02-20T11:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.