Transformer-Based Model for Cold Start Mitigation in FaaS Architecture
- URL: http://arxiv.org/abs/2504.11338v1
- Date: Tue, 15 Apr 2025 16:12:07 GMT
- Title: Transformer-Based Model for Cold Start Mitigation in FaaS Architecture
- Authors: Alexandre Savi Fayam Mbala Mouen, Jerry Lacmou Zeutouo, Vianney Kengne Tchendji,
- Abstract summary: Cold start occurs when an idle F function is invoked, requiring a full-time process, which increases latency and degrades user experience.<n>Existing solutions for cold start mitigation are limited in terms of invocation pattern generalization and implementation complexity.<n>We propose an innovative approach leveraging Transformer models to mitigate the impact of cold starts in F architectures.
- Score: 44.99833362998488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Serverless architectures, particularly the Function as a Service (FaaS) model, have become a cornerstone of modern cloud computing due to their ability to simplify resource management and enhance application deployment agility. However, a significant challenge remains: the cold start problem. This phenomenon occurs when an idle FaaS function is invoked, requiring a full initialization process, which increases latency and degrades user experience. Existing solutions for cold start mitigation are limited in terms of invocation pattern generalization and implementation complexity. In this study, we propose an innovative approach leveraging Transformer models to mitigate the impact of cold starts in FaaS architectures. Our solution excels in accurately modeling function initialization delays and optimizing serverless system performance. Experimental evaluation using a public dataset provided by Azure demonstrates a significant reduction in cold start times, reaching up to 79\% compared to conventional methods.
Related papers
- EFC++: Elastic Feature Consolidation with Prototype Re-balancing for Cold Start Exemplar-free Incremental Learning [17.815956928177638]
We consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone.
This is especially challenging for EFCIL since it requires high plasticity, resulting in feature drift.
We propose an effective approach to consolidate feature representations by regularizing drift in directions highly relevant to previous tasks.
arXiv Detail & Related papers (2025-03-13T15:01:19Z) - Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance.
Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context.
In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z) - SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions [31.01399126339857]
Serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources.
Existing solutions tend to use over-simplistic strategies for function pre-loading/unloading without full invocation pattern exploitation.
We propose SPES, the first differentiated scheduler for runtime cold start mitigation by optimizing serverless function provision.
arXiv Detail & Related papers (2024-03-26T10:28:41Z) - Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - On-demand Cold Start Frequency Reduction with Off-Policy Reinforcement Learning in Serverless Computing [18.36339203254509]
The presented work focuses on reducing the frequent, on-demand cold starts on the platform by using Reinforcement Learning(RL)
The proposed approach uses model-free Q-learning that consider function metrics such as CPU utilization, existing function instances, and response failure rate, to proactively initialize functions, in advance.
The evaluation results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and a function keep-alive policy.
arXiv Detail & Related papers (2023-08-15T03:01:41Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - FedNet2Net: Saving Communication and Computations in Federated Learning
with Model Growing [0.0]
Federated learning (FL) is a recently developed area of machine learning.
In this paper, a novel scheme based on the notion of "model growing" is proposed.
The proposed approach is tested extensively on three standard benchmarks and is shown to achieve substantial reduction in communication and client computation.
arXiv Detail & Related papers (2022-07-19T21:54:53Z) - DualCF: Efficient Model Extraction Attack from Counterfactual
Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs.
Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks.
We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.