Related papers: DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning

URL: http://arxiv.org/abs/2401.09068v1
Date: Wed, 17 Jan 2024 09:01:50 GMT
Title: DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning
Authors: Lixiang Han, Zhen Xiao, Zhenjiang Li
Abstract summary: DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices. The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML) We propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models.
Score: 12.014366791775027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.

Related papers

PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints. PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint. evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
On-device Online Learning and Semantic Management of TinyML Systems [8.183732025472766]
This study aims to bridge the gap between prototyping single TinyML models and developing reliable TinyML systems in production. We propose online learning to enable training on constrained devices, adapting local models towards the latest field conditions. We present semantic management for the joint management of models and devices at scale.
arXiv Detail & Related papers (2024-05-13T10:03:34Z)
Optimization of Lightweight Malware Detection Models For AIoT Devices [2.4947404267499587]
Malware intrusion is a problem for Internet of Things (IoT) and Artificial Intelligence of Things (AIoT) devices. This research aims to optimize the proposed super learner meta-learning ensemble model to make it viable for low-end AIoT devices.
arXiv Detail & Related papers (2024-04-06T09:30:38Z)
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric [57.3330687266266]
We find that using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Using the Module-wise Pruning Error (MoPE) metric, we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages.
arXiv Detail & Related papers (2024-03-12T17:24:26Z)
MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints. We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B. We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z)
U-TOE: Universal TinyML On-board Evaluation Toolkit for Low-Power IoT [3.981958767941474]
U-TOE is a universal toolkit designed to facilitate the task of IoT designers and researchers. We provide an open source implementation of U-TOE and demonstrate its use to experimentally evaluate the performance of various models.
arXiv Detail & Related papers (2023-06-26T10:35:31Z)
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z)
MetaNetwork: A Task-agnostic Network Parameters Generation Framework for Improving Device Model Generalization [65.02542875281233]
We propose a novel task-agnostic framework, named MetaNetwork, for generating adaptive device model parameters from cloud without on-device training. The MetaGenerator is designed to learn a mapping function from samples to model parameters, and it can generate and deliver the adaptive parameters to the device based on samples uploaded from the device to the cloud. The MetaStabilizer aims to reduce the oscillation of the MetaGenerator, accelerate the convergence and improve the model performance during both training and inference.
arXiv Detail & Related papers (2022-09-12T13:26:26Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z)
YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching. YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z)
CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)
Prune2Edge: A Multi-Phase Pruning Pipelines to Deep Ensemble Learning in IIoT [0.0]
We propose a novel edge-based multi-phase pruning pipelines to ensemble learning on IIoT devices. In the first phase, we generate a diverse ensemble of pruned models, then we apply integer quantisation, next we prune the generated ensemble using a clustering-based technique. Our proposed approach was able to outperform the predictability levels of a baseline model.
arXiv Detail & Related papers (2020-04-09T17:44:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.