Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing
- URL: http://arxiv.org/abs/2208.13968v1
- Date: Tue, 30 Aug 2022 03:15:43 GMT
- Title: Neural Architecture Search for Improving Latency-Accuracy Trade-off in
Split Computing
- Authors: Shoma Shimizu, Takayuki Nishio, Shota Saito, Yoichi Hirose, Chen
Yen-Hsiu, Shinichi Shirakawa
- Abstract summary: Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems.
In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks.
This paper proposes a neural architecture search (NAS) method for split computing.
- Score: 5.516431145236317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a neural architecture search (NAS) method for split
computing. Split computing is an emerging machine-learning inference technique
that addresses the privacy and latency challenges of deploying deep learning in
IoT systems. In split computing, neural network models are separated and
cooperatively processed using edge servers and IoT devices via networks. Thus,
the architecture of the neural network model significantly impacts the
communication payload size, model accuracy, and computational load. In this
paper, we address the challenge of optimizing neural network architecture for
split computing. To this end, we proposed NASC, which jointly explores optimal
model architecture and a split point to achieve higher accuracy while meeting
latency requirements (i.e., smaller total latency of computation and
communication than a certain threshold). NASC employs a one-shot NAS that does
not require repeating model training for a computationally efficient
architecture search. Our performance evaluation using hardware (HW)-NAS-Bench
of benchmark data demonstrates that the proposed NASC can improve the
``communication latency and model accuracy" trade-off, i.e., reduce the latency
by approximately 40-60% from the baseline, with slight accuracy degradation.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices [0.30458577208819987]
We aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM)
We propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints.
The resulting model from NAS optimized for speed achieved 5x-30x speedup.
arXiv Detail & Related papers (2024-09-27T15:35:36Z) - A Pairwise Comparison Relation-assisted Multi-objective Evolutionary Neural Architecture Search Method with Multi-population Mechanism [58.855741970337675]
Neural architecture search (NAS) enables re-searchers to automatically explore vast search spaces and find efficient neural networks.
NAS suffers from a key bottleneck, i.e., numerous architectures need to be evaluated during the search process.
We propose the SMEM-NAS, a pairwise com-parison relation-assisted multi-objective evolutionary algorithm based on a multi-population mechanism.
arXiv Detail & Related papers (2024-07-22T12:46:22Z) - Split-Et-Impera: A Framework for the Design of Distributed Deep Learning
Applications [8.434224141580758]
Split-Et-Impera determines the set of the best-split points of a neural network based on deep network interpretability principles.
It performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements.
It suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
arXiv Detail & Related papers (2023-03-22T13:00:00Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - JMSNAS: Joint Model Split and Neural Architecture Search for Learning
over Mobile Edge Networks [23.230079759174902]
Joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network.
Considering both the computing and communication resource constraints, a computational graph search problem is formulated.
Experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.
arXiv Detail & Related papers (2021-11-16T03:10:23Z) - Communication-Efficient Separable Neural Network for Distributed
Inference on Edge Devices [2.28438857884398]
We propose a novel method of exploiting model parallelism to separate a neural network for distributed inferences.
Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated.
arXiv Detail & Related papers (2021-11-03T19:30:28Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Generalized Latency Performance Estimation for Once-For-All Neural
Architecture Search [0.0]
We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space.
We provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to ProxylessNAS.
arXiv Detail & Related papers (2021-01-04T00:48:09Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.