TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
- URL: http://arxiv.org/abs/2010.08678v3
- Date: Sat, 13 Mar 2021 13:41:01 GMT
- Title: TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
- Authors: Robert David, Jared Duke, Advait Jain, Vijay Janapa Reddi, Nat
Jeffries, Jian Li, Nick Kreeger, Ian Nappier, Meghna Natraj, Shlomi Regev,
Rocky Rhodes, Tiezhen Wang, Pete Warden
- Abstract summary: Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent.
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent.
We introduce Lite Micro, an open-source ML inference framework for running deep-learning models on embedded systems.
- Score: 5.188829601887422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning inference on embedded devices is a burgeoning field with myriad
applications because tiny embedded devices are omnipresent. But we must
overcome major challenges before we can benefit from this opportunity. Embedded
processors are severely resource constrained. Their nearest mobile counterparts
exhibit at least a 100 -- 1,000x difference in compute capability, memory
availability, and power consumption. As a result, the machine-learning (ML)
models and associated ML inference framework must not only execute efficiently
but also operate in a few kilobytes of memory. Also, the embedded devices'
ecosystem is heavily fragmented. To maximize efficiency, system vendors often
omit many features that commonly appear in mainstream systems, including
dynamic memory allocation and virtual memory, that allow for cross-platform
interoperability. The hardware comes in many flavors (e.g., instruction-set
architecture and FPU support, or lack thereof). We introduce TensorFlow Lite
Micro (TF Micro), an open-source ML inference framework for running
deep-learning models on embedded systems. TF Micro tackles the efficiency
requirements imposed by embedded-system resource constraints and the
fragmentation challenges that make cross-platform interoperability nearly
impossible. The framework adopts a unique interpreter-based approach that
provides flexibility while overcoming these challenges. This paper explains the
design decisions behind TF Micro and describes its implementation details.
Also, we present an evaluation to demonstrate its low resource requirement and
minimal run-time performance overhead.
Related papers
- MicroFlow: An Efficient Rust-Based Inference Engine for TinyML [1.8902208722501446]
MicroFlow is an open-source framework for the deployment of Neural Networks (NNs) on embedded systems using the Rust programming language.
It is able to use less Flash and RAM memory than other state-of-the-art solutions for deploying NN reference models.
It can also achieve faster inference compared to existing engines on medium-size NNs, and similar performance on bigger ones.
arXiv Detail & Related papers (2024-09-28T18:34:27Z) - Designing and Implementing a Generator Framework for a SIMD Abstraction Library [53.84310825081338]
We present TSLGen, a novel end-to-end framework for generating an SIMD abstraction library.
We show that our framework is comparable to existing libraries, and we achieve the same performance results.
arXiv Detail & Related papers (2024-07-26T13:25:38Z) - Distributed Inference and Fine-tuning of Large Language Models Over The
Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size.
These models require high-end hardware, making them inaccessible to most researchers.
We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Virtualization of Tiny Embedded Systems with a robust real-time capable
and extensible Stack Virtual Machine REXAVM supporting Material-integrated
Intelligent Systems and Tiny Machine Learning [0.0]
This paper shows and evaluates the suitability of the proposed VM architecture for operationally equivalent software and hardware (FPGA) implementations.
In a holistic architecture approach, the VM specifically addresses digital signal processing and tiny machine learning.
arXiv Detail & Related papers (2023-02-17T17:13:35Z) - Experimenting with Emerging RISC-V Systems for Decentralised Machine
Learning [12.18598759507803]
Decentralised Machine Learning (DML) enables collaborative machine learning without centralised input data.
We map DML schemes to an underlying parallel programming library.
We experiment with it by generating different working DML schemes on x86-64 and ARM platforms and an emerging RISC-V one.
As a byproduct, we introduce a RISC-V porting of the PyTorch framework, the first publicly available to our knowledge.
arXiv Detail & Related papers (2023-02-15T20:57:42Z) - A review of TinyML [0.0]
The TinyML concept for embedded machine learning attempts to push such diversity from usual high-end approaches to low-end applications.
TinyML is a rapidly expanding interdisciplinary topic at the convergence of machine learning, software, and hardware.
This paper explores how TinyML can benefit a few specific industrial fields, its obstacles, and its future scope.
arXiv Detail & Related papers (2022-11-05T06:02:08Z) - TinyML for Ubiquitous Edge AI [0.0]
TinyML focuses on enabling deep learning algorithms on embedded (microcontroller powered) devices operating at extremely low power range (mW range and below)
TinyML addresses the challenges in designing power-efficient, compact deep neural network models, supporting software framework, and embedded hardware.
In this report, we discuss the major challenges and technological enablers that direct this field's expansion.
arXiv Detail & Related papers (2021-02-02T02:04:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.