Related papers: AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices

URL: http://arxiv.org/abs/2412.00724v1
Date: Sun, 01 Dec 2024 08:33:56 GMT
Title: AdaScale: Dynamic Context-aware DNN Scaling via Automated Adaptation Loop on Mobile Devices
Authors: Yuzhan Wang, Sicong Liu, Bin Guo, Boqi Zhang, Ke Ma, Yasan Ding, Hao Luo, Yao Li, Zhiwen Yu,
Abstract summary: We introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts.<n>AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
Score: 16.5444553304756
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.

Related papers

Enhancing material behavior discovery using embedding-oriented Physically-Guided Neural Networks with Internal Variables [0.0]
Physically Guided Neural Networks with Internal Variables are SciML tools that use only observable data for training and unravel internal state relations.<n>Despite their potential, these models face challenges in scalability when applied to high-dimensional data such as fine-grid spatial fields or time-evolving systems.<n>We propose some enhancements to the PGNNIV framework that address these scalability limitations through reduced-order modeling techniques.
arXiv Detail & Related papers (2025-08-01T12:33:21Z)
CrowdHMTware: A Cross-level Co-adaptation Middleware for Context-aware Mobile DL Deployment [19.229115339238803]
CrowdHMTware is a context-adaptive deep learning (DL) model deployment for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine. It can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks.
arXiv Detail & Related papers (2025-03-06T07:52:20Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Efficient Federated Intrusion Detection in 5G ecosystem using optimized BERT-based model [0.7100520098029439]
5G offers advanced services, supporting applications such as intelligent transportation, connected healthcare, and smart cities within the Internet of Things (IoT) These advancements introduce significant security challenges, with increasingly sophisticated cyber-attacks. This paper proposes a robust intrusion detection system (IDS) using federated learning and large language models (LLMs)
arXiv Detail & Related papers (2024-09-28T15:56:28Z)
Latent Neural Cellular Automata for Resource-Efficient Image Restoration [4.470499157873342]
We introduce the Latent Neural Cellular Automata (LNCA) model, a novel architecture designed to address the resource limitations of neural cellular automata. Our approach shifts the computation from the conventional input space to a specially designed latent space, relying on a pre-trained autoencoder. This modification not only reduces the model's resource consumption but also maintains a flexible framework suitable for various applications.
arXiv Detail & Related papers (2024-03-22T14:15:28Z)
Learning-based adaption of robotic friction models [50.72489248401199]
We introduce a novel approach to adapt an existing friction model to new dynamics using as little data as possible.<n>Our method does not rely on data with external load during training, eliminating the need for external torque sensors.
arXiv Detail & Related papers (2023-10-25T14:50:15Z)
FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing [5.815300670677979]
We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an asymmetric environment. Our method achieves 60% lower than a state-of-the-art SC method without decreasing accuracy and is up 16x faster than offloading with existing standards.
arXiv Detail & Related papers (2023-02-21T14:03:22Z)
In-situ Model Downloading to Realize Versatile Edge AI in 6G Mobile Networks [61.416494781759326]
In-situ model downloading aims to achieve transparent and real-time replacement of on-device AI models by downloading from an AI library in the network. A key component of the presented framework is a set of techniques that dynamically compress a downloaded model at the depth-level, parameter-level, or bit-level. We propose a 6G network architecture customized for deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI library.
arXiv Detail & Related papers (2022-10-07T13:41:15Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device [17.43467167013752]
We present DynO, a distributed inference framework that combines the best of both worlds to address several challenges. We show that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution.
arXiv Detail & Related papers (2021-04-20T13:20:15Z)
Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net) Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate. It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z)
AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model Compression for Mobile Applications [15.134752032646231]
We present AdaSpring, a context-adaptive and self-evolutionary DNN compression framework. It enables the runtime adaptive compression locally online. Experiment outcomes show that AdaSpring obtains up to 3.1x latency reduction, 4.2 x energy efficiency improvement in DNNs.
arXiv Detail & Related papers (2021-01-28T03:30:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.