TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
- URL: http://arxiv.org/abs/2311.01759v1
- Date: Fri, 3 Nov 2023 07:34:47 GMT
- Title: TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
- Authors: Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Junyi Chen,
Lingkun Long, Han Wan, Bei Yu, Weisheng Zhao
- Abstract summary: TinyFormer is a framework designed to develop and deploy resource-efficient transformers on MCUs.
TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine.
TinyFormer can develop efficient transformers with an accuracy of $96.1%$ while adhering to hardware constraints of $1$MB storage and $320$KB memory.
- Score: 7.529632803434906
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Developing deep learning models on tiny devices (e.g. Microcontroller units,
MCUs) has attracted much attention in various embedded IoT applications.
However, it is challenging to efficiently design and deploy recent advanced
models (e.g. transformers) on tiny devices due to their severe hardware
resource constraints. In this work, we propose TinyFormer, a framework
specifically designed to develop and deploy resource-efficient transformers on
MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine.
Separately, SuperNAS aims to search for an appropriate supernet from a vast
search space. SparseNAS evaluates the best sparse single-path model including
transformer architecture from the identified supernet. Finally, SparseEngine
efficiently deploys the searched sparse models onto MCUs. To the best of our
knowledge, SparseEngine is the first deployment framework capable of performing
inference of sparse models with transformer on MCUs. Evaluation results on the
CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers
with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB
storage and $320$KB memory. Additionally, TinyFormer achieves significant
speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN
library. TinyFormer is believed to bring powerful transformers into TinyML
scenarios and greatly expand the scope of deep learning applications.
Related papers
- DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with Pruning [12.014366791775027]
DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices.
The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML)
We propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models.
arXiv Detail & Related papers (2024-01-17T09:01:50Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - TinyReptile: TinyML with Federated Meta-Learning [9.618821589196624]
We propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning.
We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM.
arXiv Detail & Related papers (2023-04-11T13:11:10Z) - Reversible Vision Transformers [74.3500977090597]
Reversible Vision Transformers are a memory efficient architecture for visual recognition.
We adapt two popular models, namely Vision Transformer and Multiscale Vision Transformers, to reversible variants.
We find that the additional computational burden of recomputing activations is more than overcome for deeper models.
arXiv Detail & Related papers (2023-02-09T18:59:54Z) - TinyViT: Fast Pretraining Distillation for Small Vision Transformers [88.54212027516755]
We propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale datasets.
The central idea is to transfer knowledge from large pretrained models to small ones, while enabling small models to get the dividends of massive pretraining data.
arXiv Detail & Related papers (2022-07-21T17:59:56Z) - AutoFormer: Searching Transformers for Visual Recognition [97.60915598958968]
We propose a new one-shot architecture search framework, namely AutoFormer, dedicated to vision transformer search.
AutoFormer entangles the weights of different blocks in the same layers during supernet training.
We show that AutoFormer-tiny/small/base achieve 74.7%/81.7%/82.4% top-1 accuracy on ImageNet with 5.7M/22.9M/53.7M parameters.
arXiv Detail & Related papers (2021-07-01T17:59:30Z) - Escaping the Big Data Paradigm with Compact Transformers [7.697698018200631]
We show for the first time that with the right size and tokenization, transformers can perform head-to-head with state-of-the-art CNNs on small datasets.
Our method is flexible in terms of model size, and can have as little as 0.28M parameters and achieve reasonable results.
arXiv Detail & Related papers (2021-04-12T17:58:56Z) - $\mu$NAS: Constrained Neural Architecture Search for Microcontrollers [15.517404770022633]
IoT devices are powered by microcontroller units (MCUs) which are extremely resource-scarce.
We build a neural architecture search (NAS) system, called $mu$NAS, to automate the design of such small-yet-powerful MCU-level networks.
NAS is able to improve top-1 classification accuracy by up to 4.8%, or (b) reduce memory footprint by 4--13x, or (c) reduce the number of multiply-accumulate operations by at least 2x.
arXiv Detail & Related papers (2020-10-27T12:42:53Z) - MCUNet: Tiny Deep Learning on IoT Devices [62.752899523628066]
We propose a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine)
TinyNAS adopts a two-stage neural architecture search approach that first optimize the search space to fit the resource constraints, then specializes the network architecture in the optimized search space.
TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x.
arXiv Detail & Related papers (2020-07-20T17:59:01Z) - HAT: Hardware-Aware Transformers for Efficient Natural Language
Processing [78.48577649266018]
Hardware-Aware Transformers (HAT) are designed to enable low-latency inference on resource-constrained hardware platforms.
We train a $textitSuperTransformer$ that covers all candidates in the design space, and efficiently produces many $textitSubTransformers$ with weight sharing.
Experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware.
arXiv Detail & Related papers (2020-05-28T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.