Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices
- URL: http://arxiv.org/abs/2107.09123v1
- Date: Mon, 19 Jul 2021 19:39:56 GMT
- Title: Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices
- Authors: Tanmay Jain, Avaneesh, Rohit Verma, Rajeev Shorey
- Abstract summary: We argue that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem.
Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge.
- Score: 1.6873748786804317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing reliance of users on smart devices, bringing essential
computation at the edge has become a crucial requirement for any type of
business. Many such computations utilize Convolution Neural Networks (CNNs) to
perform AI tasks, having high resource and computation requirements, that are
infeasible for edge devices. Splitting the CNN architecture to perform part of
the computation on edge and remaining on the cloud is an area of research that
has seen increasing interest in the field. In this paper, we assert that
running CNNs between an edge device and the cloud is synonymous to solving a
resource-constrained optimization problem that minimizes the latency and
maximizes resource utilization at the edge. We formulate a multi-objective
optimization problem and propose the LMOS algorithm to achieve a Pareto
efficient solution. Experiments done on real-world edge devices show that, LMOS
ensures feasible execution of different CNN models at the edge and also
improves upon existing state-of-the-art approaches.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Evolution of Convolutional Neural Network (CNN): Compute vs Memory
bandwidth for Edge AI [0.0]
This article explores the relationship between CNN compute requirements and memory bandwidth in the context of Edge AI.
We examine the impact of increasing model complexity on both computational requirements and memory access patterns.
This analysis provides insights into designing efficient architectures and potential hardware accelerators in enhancing CNN performance on edge devices.
arXiv Detail & Related papers (2023-09-24T09:11:22Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Complexity-Driven CNN Compression for Resource-constrained Edge AI [1.6114012813668934]
We propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs.
We define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs.
arXiv Detail & Related papers (2022-08-26T16:01:23Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning
over Heterogeneous Edge Devices [39.09319776243573]
CoEdge is a distributed Deep Neural Network (DNN) computing system that orchestrates cooperative inference over heterogeneous edge devices.
CoEdge saves energy with close inference latency, achieving up to 25.5%66.9% energy reduction for four widely-adopted CNN models.
arXiv Detail & Related papers (2020-12-06T13:15:52Z) - Joint Multi-User DNN Partitioning and Computational Resource Allocation
for Collaborative Edge Intelligence [21.55340197267767]
Mobile Edge Computing (MEC) has emerged as a promising supporting architecture providing a variety of resources to the network edge.
With the assistance of edge servers, user equipments (UEs) are able to run deep neural network (DNN) based AI applications.
We propose an algorithm called Iterative Alternating Optimization (IAO) that can achieve the optimal solution in time.
arXiv Detail & Related papers (2020-07-15T09:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.