Related papers: Metronome: Differentiated Delay Scheduling for Serverless Functions

Metronome: Differentiated Delay Scheduling for Serverless Functions

URL: http://arxiv.org/abs/2512.05703v1
Date: Fri, 05 Dec 2025 13:30:04 GMT
Title: Metronome: Differentiated Delay Scheduling for Serverless Functions
Authors: Zhuangbin Chen, Juzheng Zheng, Zibin Zheng,
Abstract summary: We propose Metronome, a delay scheduling framework that employs predictive mechanisms to identify optimal locality-aware nodes for individual functions.<n>Our implementation on OpenLambda shows that Metronome significantly outperforms baselines, achieving 64.88%-95.83% reduction in mean execution time for functions.
Score: 42.99495101001926
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Function-as-a-Service (FaaS) computing is an emerging cloud computing paradigm for its ease-of-management and elasticity. However, optimizing scheduling for serverless functions remains challenging due to their dynamic and event-driven nature. While data locality has been proven effective in traditional cluster computing systems through delay scheduling, its application in serverless platforms remains largely unexplored. In this paper, we systematically evaluate existing delay scheduling methods in serverless environments and identify three key observations: 1) delay scheduling benefits vary significantly based on function input characteristics; 2) serverless computing exhibits more complex locality patterns than cluster computing systems, encompassing both data locality and infrastructure locality; and 3) heterogeneous function execution times make rule-based delay thresholds ineffective. Based on these insights, we propose Metronome, a differentiated delay scheduling framework that employs predictive mechanisms to identify optimal locality-aware nodes for individual functions. Metronome leverages an online Random Forest Regression model to forecast function execution times across various nodes, enabling informed delay decisions while preventing SLA violations. Our implementation on OpenLambda shows that Metronome significantly outperforms baselines, achieving 64.88%-95.83% reduction in mean execution time for functions, while maintaining performance advantages under increased concurrency levels and ensuring SLA compliance.

Related papers

TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs [8.818252253980985]
TempoNet is a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation.<n>A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets.
arXiv Detail & Related papers (2026-02-20T09:56:23Z)
HALO: Semantic-Aware Distributed LLM Inference in Lossy Edge Network [50.33808558714122]
Large language models' (LLMs) inference at the edge can facilitate prompt service responsiveness while protecting user privacy.<n>We propose HALO, a novel framework that can boost the distributed LLM inference in lossy edge network.<n> Experimental results from a Raspberry Pi cluster demonstrate that HALO achieves a 3.41x end-to-end speedup for LLaMA-series LLMs under unreliable network conditions.
arXiv Detail & Related papers (2026-01-16T07:37:23Z)
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems [62.24576366776727]
We propose a latency-aware scheduling framework to minimize total inference latency.<n>We show that the proposed method significantly reduces cold-start latency compared to baseline strategies.
arXiv Detail & Related papers (2025-08-15T07:49:22Z)
Multimodal Remote Inference [14.609320101695575]
We study a two-modality scheduling problem that seeks to minimize the ML model's inference error.<n>We show that both modalities share the same threshold and that the index functions and the threshold can be computed efficiently.
arXiv Detail & Related papers (2025-08-11T02:30:44Z)
Learning Unified System Representations for Microservice Tail Latency Prediction [8.532290784939967]
Microservice architectures have become the de facto standard for building scalable cloud-native applications.<n>Traditional approaches often rely on per-request latency metrics, which are highly sensitive to transient noise.<n>We propose USRFNet, a deep learning network that explicitly separates and models traffic-side and resource-side features.
arXiv Detail & Related papers (2025-08-03T07:46:23Z)
Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z)
SafeTail: Efficient Tail Latency Optimization in Edge Service Scheduling via Computational Redundancy Management [2.707215971599082]
Emerging applications, such as augmented reality, require low-latency computing services with high reliability on user devices. We introduce SafeTail, a framework that meets both median and tail response time targets, with tail latency defined as latency beyond the 90th percentile threshold.
arXiv Detail & Related papers (2024-08-30T10:17:37Z)
DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server. To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z)
Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback [29.177402567437206]
We present a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements. We numerically show that the resulting policy outperforms other limited information scheduling strategies. We show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
arXiv Detail & Related papers (2021-09-17T13:45:02Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.