Dynamic ConvNets on Tiny Devices via Nested Sparsity
- URL: http://arxiv.org/abs/2203.03324v1
- Date: Mon, 7 Mar 2022 12:07:02 GMT
- Title: Dynamic ConvNets on Tiny Devices via Nested Sparsity
- Authors: Matteo Grimaldi, Luca Mocerino, Antonio Cipolletta, Andrea Calimera
- Abstract summary: This work introduces a new training and compression pipeline to build Nested Sparse ConvNets.
A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets.
Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit.
- Score: 3.0313758880048765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces a new training and compression pipeline to build Nested
Sparse ConvNets, a class of dynamic Convolutional Neural Networks (ConvNets)
suited for inference tasks deployed on resource-constrained devices at the edge
of the Internet-of-Things. A Nested Sparse ConvNet consists of a single ConvNet
architecture containing N sparse sub-networks with nested weights subsets, like
a Matryoshka doll, and can trade accuracy for latency at run time, using the
model sparsity as a dynamic knob. To attain high accuracy at training time, we
propose a gradient masking technique that optimally routes the learning signals
across the nested weights subsets. To minimize the storage footprint and
efficiently process the obtained models at inference time, we introduce a new
sparse matrix compression format with dedicated compute kernels that fruitfully
exploit the characteristic of the nested weights subsets. Tested on image
classification and object detection tasks on an off-the-shelf ARM-M7 Micro
Controller Unit (MCU), Nested Sparse ConvNets outperform variable-latency
solutions naively built assembling single sparse models trained as stand-alone
instances, achieving (i) comparable accuracy, (ii) remarkable storage savings,
and (iii) high performance. Moreover, when compared to state-of-the-art dynamic
strategies, like dynamic pruning and layer width scaling, Nested Sparse
ConvNets turn out to be Pareto optimal in the accuracy vs. latency space.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full
Context Interaction [0.0]
Self-attention mechanism utilizes large implicit weight matrices, programmed through dot product-based activations with very few trainable parameters, to enable long sequence modeling.
In this paper, we investigate the possibility of discarding residual learning by employing large implicit kernels to achieve full context interaction at each layer of the network.
Our model incorporates several innovative components and exhibits excellent properties, such as introducing local feedback error for updating the slow network, stable zero-mean features, faster training convergence, and fewer model parameters.
arXiv Detail & Related papers (2024-01-31T15:57:21Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - DRESS: Dynamic REal-time Sparse Subnets [7.76526807772015]
We propose a novel training algorithm, Dynamic REal-time Sparse Subnets (DRESS)
DRESS samples multiple sub-networks from the same backbone network through row-based unstructured sparsity, and jointly trains these sub-networks in parallel with weighted loss.
Experiments on public vision datasets show that DRESS yields significantly higher accuracy than state-of-the-art sub-networks.
arXiv Detail & Related papers (2022-07-01T22:05:07Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Compact Multi-level Sparse Neural Networks with Input Independent
Dynamic Rerouting [33.35713740886292]
Sparse deep neural networks can substantially reduce the complexity and memory consumption of the models.
Facing the real-life challenges, we propose to train a sparse model that supports multiple sparse levels.
In this way, one can dynamically select the appropriate sparsity level during inference, while the storage cost is capped by the least sparse sub-model.
arXiv Detail & Related papers (2021-12-21T01:35:51Z) - Optimising for Interpretability: Convolutional Dynamic Alignment
Networks [108.83345790813445]
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets)
Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns.
CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions.
arXiv Detail & Related papers (2021-09-27T12:39:46Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.