AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
- URL: http://arxiv.org/abs/2412.00724v1
- Date: Sun, 01 Dec 2024 08:33:56 GMT
- Title: AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
- Authors: Yuzhan Wang, Sicong Liu, Bin Guo, Boqi Zhang, Ke Ma, Yasan Ding, Hao Luo, Yao Li, Zhiwen Yu,
- Abstract summary: We introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts.
AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
- Score: 16.5444553304756
- License:
- Abstract: Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Efficient Federated Intrusion Detection in 5G ecosystem using optimized BERT-based model [0.7100520098029439]
5G offers advanced services, supporting applications such as intelligent transportation, connected healthcare, and smart cities within the Internet of Things (IoT)
These advancements introduce significant security challenges, with increasingly sophisticated cyber-attacks.
This paper proposes a robust intrusion detection system (IDS) using federated learning and large language models (LLMs)
arXiv Detail & Related papers (2024-09-28T15:56:28Z) - Latent Neural Cellular Automata for Resource-Efficient Image Restoration [4.470499157873342]
We introduce the Latent Neural Cellular Automata (LNCA) model, a novel architecture designed to address the resource limitations of neural cellular automata.
Our approach shifts the computation from the conventional input space to a specially designed latent space, relying on a pre-trained autoencoder.
This modification not only reduces the model's resource consumption but also maintains a flexible framework suitable for various applications.
arXiv Detail & Related papers (2024-03-22T14:15:28Z) - FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing [5.815300670677979]
We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an asymmetric environment.
Our method achieves 60% lower than a state-of-the-art SC method without decreasing accuracy and is up 16x faster than offloading with existing standards.
arXiv Detail & Related papers (2023-02-21T14:03:22Z) - In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile
Networks [61.416494781759326]
In-situ model downloading aims to achieve transparent and real-time replacement of on-device AI models by downloading from an AI library in the network.
A key component of the presented framework is a set of techniques that dynamically compress a downloaded model at the depth-level, parameter-level, or bit-level.
We propose a 6G network architecture customized for deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI library.
arXiv Detail & Related papers (2022-10-07T13:41:15Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges.
We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model
Compression for Mobile Applications [15.134752032646231]
We present AdaSpring, a context-adaptive and self-evolutionary DNN compression framework.
It enables the runtime adaptive compression locally online.
Experiment outcomes show that AdaSpring obtains up to 3.1x latency reduction, 4.2 x energy efficiency improvement in DNNs.
arXiv Detail & Related papers (2021-01-28T03:30:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.