Related papers: JMSNAS: Joint Model Split and Neural Architecture Search for Learning over Mobile Edge Networks

JMSNAS: Joint Model Split and Neural Architecture Search for Learning over Mobile Edge Networks

URL: http://arxiv.org/abs/2111.08206v1
Date: Tue, 16 Nov 2021 03:10:23 GMT
Title: JMSNAS: Joint Model Split and Neural Architecture Search for Learning over Mobile Edge Networks
Authors: Yuqing Tian, Zhaoyang Zhang, Zhaohui Yang, Qianqian Yang
Abstract summary: Joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network. Considering both the computing and communication resource constraints, a computational graph search problem is formulated. Experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.
Score: 23.230079759174902
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: The main challenge to deploy deep neural network (DNN) over a mobile edge network is how to split the DNN model so as to match the network architecture as well as all the nodes' computation and communication capacity. This essentially involves two highly coupled procedures: model generating and model splitting. In this paper, a joint model split and neural architecture search (JMSNAS) framework is proposed to automatically generate and deploy a DNN model over a mobile edge network. Considering both the computing and communication resource constraints, a computational graph search problem is formulated to find the multi-split points of the DNN model, and then the model is trained to meet some accuracy requirements. Moreover, the trade-off between model accuracy and completion latency is achieved through the proper design of the objective function. The experiment results confirm the superiority of the proposed framework over the state-of-the-art split machine learning design methods.

Related papers

Exploring Neural Network Pruning with Screening Methods [3.443622476405787]
Modern deep learning models have tens of millions of parameters which makes the inference processes resource-intensive. This paper proposes and evaluates a network pruning framework that eliminates non-essential parameters. The proposed framework produces competitive lean networks compared to the original networks.
arXiv Detail & Related papers (2025-02-11T02:31:04Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals [58.83169560132308]
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks.
arXiv Detail & Related papers (2024-07-18T17:59:01Z)
An Attempt to Devise a Pairwise Ising-Type Maximum Entropy Model Integrated Cost Function for Optimizing SNN Deployment [0.0]
A spiking neural network (SNN) deployment process often involves partitioning the neural network onto processing units within the neuromorphic hardware. Finding optimal deployment schemes is an NP-hard problem. These objectives require consideration of network dynamics shaped by neuron activity patterns. Our approach focuses on network dynamics, which are hardware-independent and can be modeled separately from specific hardware configurations.
arXiv Detail & Related papers (2024-07-09T16:33:43Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Split-Et-Impera: A Framework for the Design of Distributed Deep Learning Applications [8.434224141580758]
Split-Et-Impera determines the set of the best-split points of a neural network based on deep network interpretability principles. It performs a communication-aware simulation for the rapid evaluation of different neural network rearrangements. It suggests the best match between the quality of service requirements of the application and the performance in terms of accuracy and latency time.
arXiv Detail & Related papers (2023-03-22T13:00:00Z)
Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z)
Optimal Model Placement and Online Model Splitting for Device-Edge Co-Inference [22.785214118527872]
Device-edge co-inference opens up new possibilities for resource-constrained wireless devices to execute deep neural network (DNN)-based applications. We study the joint optimization of the model placement and online model splitting decisions to minimize the energy-and-time cost of device-edge co-inference.
arXiv Detail & Related papers (2021-05-28T06:55:04Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs) These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training. Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z)
Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs) Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.