REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints
- URL: http://arxiv.org/abs/2311.13349v2
- Date: Wed, 20 Mar 2024 10:21:34 GMT
- Title: REDS: Resource-Efficient Deep Subnetworks for Dynamic Resource Constraints
- Authors: Francesco Corti, Balz Maag, Joachim Schauer, Ulrich Pferschy, Olga Saukh,
- Abstract summary: State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime.
We introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources.
We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy.
- Score: 2.9209462960232235
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep models deployed on edge devices frequently encounter resource variability, which arises from fluctuating energy levels, timing constraints, or prioritization of other critical tasks within the system. State-of-the-art machine learning pipelines generate resource-agnostic models, not capable to adapt at runtime. In this work we introduce Resource-Efficient Deep Subnetworks (REDS) to tackle model adaptation to variable resources. In contrast to the state-of-the-art, REDS use structured sparsity constructively by exploiting permutation invariance of neurons, which allows for hardware-specific optimizations. Specifically, REDS achieve computational efficiency by (1) skipping sequential computational blocks identified by a novel iterative knapsack optimizer, and (2) leveraging simple math to re-arrange the order of operations in REDS computational graph to take advantage of the data cache. REDS support conventional deep networks frequently deployed on the edge and provide computational benefits even for small and simple networks. We evaluate REDS on seven benchmark architectures trained on the Visual Wake Words, Google Speech Commands, Fashion-MNIST and CIFAR10 datasets, and test on four off-the-shelf mobile and embedded hardware platforms. We provide a theoretical result and empirical evidence for REDS outstanding performance in terms of submodels' test set accuracy, and demonstrate an adaptation time in response to dynamic resource constraints of under 40$\mu$s, utilizing a 2-layer fully-connected network on Arduino Nano 33 BLE.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Toward Efficient Convolutional Neural Networks With Structured Ternary Patterns [1.1965844936801797]
Convolutional neural networks (ConvNets) exert severe demands on local device resources.
This brief presents work toward utilizing static convolutional filters to design efficient ConvNet architectures.
arXiv Detail & Related papers (2024-07-20T10:18:42Z) - REP: Resource-Efficient Prompting for On-device Continual Learning [23.92661395403251]
On-device continual learning (CL) requires the co-optimization of model accuracy and resource efficiency to be practical.
It is commonly believed that CNN-based CL excels in resource efficiency, whereas ViT-based CL is superior in model performance.
We introduce REP, which improves resource efficiency specifically targeting prompt-based rehearsal-free methods.
arXiv Detail & Related papers (2024-06-07T09:17:33Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling.
This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - DRESS: Dynamic REal-time Sparse Subnets [7.76526807772015]
We propose a novel training algorithm, Dynamic REal-time Sparse Subnets (DRESS)
DRESS samples multiple sub-networks from the same backbone network through row-based unstructured sparsity, and jointly trains these sub-networks in parallel with weighted loss.
Experiments on public vision datasets show that DRESS yields significantly higher accuracy than state-of-the-art sub-networks.
arXiv Detail & Related papers (2022-07-01T22:05:07Z) - Deep Learning-based Resource Allocation For Device-to-Device
Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication.
A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models.
Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.