DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
- URL: http://arxiv.org/abs/2401.09068v1
- Date: Wed, 17 Jan 2024 09:01:50 GMT
- Title: DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
- Authors: Lixiang Han, Zhen Xiao, Zhenjiang Li
- Abstract summary: DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices.
The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML)
We propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models.
- Score: 12.014366791775027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DTMM is a library designed for efficient deployment and execution of machine
learning models on weak IoT devices such as microcontroller units (MCUs). The
motivation for designing DTMM comes from the emerging field of tiny machine
learning (TinyML), which explores extending the reach of machine learning to
many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak
capability of embedded devices, it is necessary to compress models by pruning
enough weights before deploying. Although pruning has been studied extensively
on many computing platforms, two key issues with pruning methods are
exacerbated on MCUs: models need to be deeply compressed without significantly
compromising accuracy, and they should perform efficiently after pruning.
Current solutions only achieve one of these objectives, but not both. In this
paper, we find that pruned models have great potential for efficient deployment
and execution on MCUs. Therefore, we propose DTMM with pruning unit selection,
pre-execution pruning optimizations, runtime acceleration, and post-execution
low-cost storage to fill the gap for efficient deployment and execution of
pruned models. It can be integrated into commercial ML frameworks for practical
deployment, and a prototype system has been developed. Extensive experiments on
various models show promising gains compared to state-of-the-art methods.
Related papers
- On-device Online Learning and Semantic Management of TinyML Systems [8.183732025472766]
This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production.
We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions.
We present semantic management for the joint management of models and devices at scale.
arXiv Detail & Related papers (2024-05-13T10:03:34Z) - Optimization of Lightweight Malware Detection Models For AIoT Devices [2.4947404267499587]
Malware intrusion is a problem for Internet of Things (IoT) and Artificial Intelligence of Things (AIoT) devices.
This research aims to optimize the proposed super learner meta-learning ensemble model to make it viable for low-end AIoT devices.
arXiv Detail & Related papers (2024-04-06T09:30:38Z) - MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with
Module-wise Pruning Error Metric [57.3330687266266]
We find that using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance.
Using the Module-wise Pruning Error (MoPE) metric, we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages.
arXiv Detail & Related papers (2024-03-12T17:24:26Z) - MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - U-TOE: Universal TinyML On-board Evaluation Toolkit for Low-Power IoT [3.981958767941474]
U-TOE is a universal toolkit designed to facilitate the task of IoT designers and researchers.
We provide an open source implementation of U-TOE and demonstrate its use to experimentally evaluate the performance of various models.
arXiv Detail & Related papers (2023-06-26T10:35:31Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - MetaNetwork: A Task-agnostic Network Parameters Generation Framework for
Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training.
The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud.
The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - Prune2Edge: A Multi-Phase Pruning Pipelines to Deep Ensemble Learning in
IIoT [0.0]
We propose a novel edge-based multi-phase pruning pipelines to ensemble learning on IIoT devices.
In the first phase, we generate a diverse ensemble of pruned models, then we apply integer quantisation, next we prune the generated ensemble using a clustering-based technique.
Our proposed approach was able to outperform the predictability levels of a baseline model.
arXiv Detail & Related papers (2020-04-09T17:44:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.