Related papers: MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference

URL: http://arxiv.org/abs/2412.04147v1
Date: Thu, 05 Dec 2024 13:19:34 GMT
Title: MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
Authors: Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris,
Abstract summary: We introduce MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler for distributed inference.<n>We demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices.
Score: 4.556037016746581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cascade systems, consisting of a lightweight model processing all samples and a heavier, high-accuracy model refining challenging samples, have become a widely-adopted distributed inference approach to achieving high accuracy and maintaining a low computational burden for mobile and IoT devices. As intelligent indoor environments, like smart homes, continue to expand, a new scenario emerges, the multi-device cascade. In this setting, multiple diverse devices simultaneously utilize a shared heavy model hosted on a server, often situated within or close to the consumer environment. This work introduces MultiTASC++, a continuously adaptive multi-tenancy-aware scheduler that dynamically controls the forwarding decision functions of devices to optimize system throughput while maintaining high accuracy and low latency. Through extensive experimentation in diverse device environments and with varying server-side models, we demonstrate the scheduler's efficacy in consistently maintaining a targeted satisfaction rate while providing the highest available accuracy across different device tiers and workloads of up to 100 devices. This demonstrates its scalability and efficiency in addressing the unique challenges of collaborative DNN inference in dynamic and diverse IoT environments.

Related papers

WECAR: An End-Edge Collaborative Inference and Training Framework for WiFi-Based Continuous Human Activity Recognition [23.374051991346633]
We propose WECAR, an end-edge collaborative inference and training framework for WiFi-based continuous HAR. We implement WECAR based on heterogeneous hardware using Jetson Nano as edge devices and the ESP32 as end devices. Our experiments across three public WiFi datasets reveal that WECAR not only outperforms several state-of-the-art methods in performance and parameter efficiency, but also achieves a substantial reduction in the model's parameter count post-optimization.
arXiv Detail & Related papers (2025-03-09T03:40:27Z)
CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads [4.556037016746581]
This article addresses the challenges inherent in optimising the execution of deep neural networks (DNNs) on mobile devices. We introduce CARIn, a novel framework designed for the optimised deployment of both single- and multi-DNN applications. We observe a substantial enhancement in the fair treatment of the problem's objectives, reaching 1.92x when compared to single-model designs and up to 10.69x in contrast to the state-of-the-art OODIn framework.
arXiv Detail & Related papers (2024-09-02T09:18:11Z)
Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration [37.456185990843515]
We introduce a Universal On-Device Multi-modal Model Adaptation Framework. The framework features the Fast Domain Adaptor (FDA) hosted in the cloud, providing tailored parameters for the Lightweight Multi-modal Model on devices. Our contributions represent a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA)
arXiv Detail & Related papers (2024-05-21T14:42:18Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge [4.281723404774888]
This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the decision functions of devices. By explicitly considering device forwarding, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods.
arXiv Detail & Related papers (2023-06-22T12:04:49Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [69.1412199244903]
We present a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. S SensiX++ operates on two fundamental principles - highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration. We report on the overall throughput and quantified benefits of various automation components of SensiX++ and demonstrate its efficacy to significantly reduce operational complexity and lower the effort to deploy, upgrade, reconfigure and serve embedded models on edge devices.
arXiv Detail & Related papers (2021-09-08T22:06:16Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.