Related papers: Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices

Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices

URL: http://arxiv.org/abs/2510.06882v1
Date: Wed, 08 Oct 2025 10:51:50 GMT
Title: Multi-Dimensional Autoscaling of Stream Processing Services on Edge Devices
Authors: Boris Sedlak, Philipp Raith, Andrea Morichetta, Víctor Casamayor Pujol, Schahram Dustdar,
Abstract summary: We introduce a Multi-dimensional Autoscaling Platform (MUDAP) that supports fine-grained vertical scaling across both service- and resource-level dimensions.<n>We present a scaling agent based on Regression Analysis of Structural Knowledge (RASK) to optimize execution across services.
Score: 5.831429356033195
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Edge devices have limited resources, which inevitably leads to situations where stream processing services cannot satisfy their needs. While existing autoscaling mechanisms focus entirely on resource scaling, Edge devices require alternative ways to sustain the Service Level Objectives (SLOs) of competing services. To address these issues, we introduce a Multi-dimensional Autoscaling Platform (MUDAP) that supports fine-grained vertical scaling across both service- and resource-level dimensions. MUDAP supports service-specific scaling tailored to available parameters, e.g., scale data quality or model size for a particular service. To optimize the execution across services, we present a scaling agent based on Regression Analysis of Structural Knowledge (RASK). The RASK agent efficiently explores the solution space and learns a continuous regression model of the processing environment for inferring optimal scaling actions. We compared our approach with two autoscalers, the Kubernetes VPA and a reinforcement learning agent, for scaling up to 9 services on a single Edge device. Our results showed that RASK can infer an accurate regression model in merely 20 iterations (i.e., observe 200s of processing). By increasingly adding elasticity dimensions, RASK sustained the highest request load with 28% less SLO violations, compared to baselines.

Related papers

FLAS: a combination of proactive and reactive auto-scaling architecture for distributed services [0.0]
We present FLAS (Forecasted Load Auto-Scaling), an auto-scaler for distributed services.<n>It combines the advantages of proactive and reactive approaches according to the situation to decide the optimal scaling actions.<n>We provide a FLAS implementation for the use case of a content-based publish-subscribe distributed systems.
arXiv Detail & Related papers (2025-10-23T09:38:07Z)
TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform [60.378160142579]
It is crucial to consider computing costs when deploying on target platforms like the NVIDIAtextsuperscripttextregistered DRIVE PX 2.<n>Our objective is to customize the semantic segmentation network according to the computing power and specific scenarios of autonomous driving hardware.
arXiv Detail & Related papers (2025-08-17T08:09:13Z)
PRISM: Distributed Inference for Foundation Models at Edge [73.54372283220444]
PRISM is a communication-efficient and compute-aware strategy for distributed Transformer inference on edge devices.<n>We evaluate PRISM on ViT, BERT, and GPT-2 across diverse datasets.
arXiv Detail & Related papers (2025-07-16T11:25:03Z)
Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods [5.201504495733271]
This work introduces an agent-based autoscaling framework to maximize requirements fulfillment in constrained environments.<n>We compare four types of scaling agents: Active Inference, Deep Q Network, Analysis of Structural Knowledge, and Deep Active Inference.
arXiv Detail & Related papers (2025-06-12T07:20:26Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges.<n>We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments.<n>Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z)
DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach. It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z)
V2N Service Scaling with Deep Reinforcement Learning [0.0]
We employ Deep Reinforcement Learning (DRL) for vertical scaling in Edge computing to support vehicular-to-network communications. We show that DDPG outperforms existing solutions, reducing the average number of active CPUs by 23% while increasing the long-term reward by 24%.
arXiv Detail & Related papers (2023-01-30T23:13:18Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.