Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing
- URL: http://arxiv.org/abs/2602.23935v1
- Date: Fri, 27 Feb 2026 11:35:15 GMT
- Title: Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing
- Authors: Bowen Sun, Christos D. Antonopoulos, Evgenia Smirni, Bin Ren, Nikolaos Bellas, Spyros Lalis,
- Abstract summary: Serverless computing simplifies cloud deployment but introduces new challenges in managing service latency and carbon emissions.<n>We present LACE-RL, a latency-aware and carbon-efficient management framework.<n>We show that LACE-RL reduces cold starts by 51.69% and idle keep-alive carbon emissions by 77.08% compared to Huawei's static policy.
- Score: 12.749575649611643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Serverless computing simplifies cloud deployment but introduces new challenges in managing service latency and carbon emissions. Reducing cold-start latency requires retaining warm function instances, while minimizing carbon emissions favors reclaiming idle resources. This balance is further complicated by time-varying grid carbon intensity and varying workload patterns, under which static keep-alive policies are inefficient. We present LACE-RL, a latency-aware and carbon-efficient management framework that formulates serverless pod retention as a sequential decision problem. LACE-RL uses deep reinforcement learning to dynamically tune keep-alive durations, jointly modeling cold-start probability, function-specific latency costs, and real-time carbon intensity. Using the Huawei Public Cloud Trace, we show that LACE-RL reduces cold starts by 51.69% and idle keep-alive carbon emissions by 77.08% compared to Huawei's static policy, while achieving better latency-carbon trade-offs than state-of-the-art heuristic and single-objective baselines, approaching Oracle performance.
Related papers
- CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems [62.24576366776727]
We propose a latency-aware scheduling framework to minimize total inference latency.<n>We show that the proposed method significantly reduces cold-start latency compared to baseline strategies.
arXiv Detail & Related papers (2025-08-15T07:49:22Z) - Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization [48.70916202664808]
DiffCarl is a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems.<n>It outperforms classic algorithms and state-of-the-art DRL solutions, with 2.3-30.1% lower operational cost.<n>It also achieves 28.7% lower carbon emissions than those of its carbon-unaware variant and reduces performance variability.
arXiv Detail & Related papers (2025-07-22T03:27:07Z) - CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices [0.44784055850794474]
Large Language Models (LLMs) enable real-time function calling in edge AI systems but introduce significant computational overhead, leading to high power consumption and carbon emissions.<n>We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized adaptation.<n>Experiments on an NVIDIA Jetson AGX Orin show that CarbonCall reduces carbon emissions by up to 52%, power consumption by 30%, and execution time by 30%, while maintaining high efficiency.
arXiv Detail & Related papers (2025-04-29T01:37:08Z) - ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-Serving [61.35068981176018]
ConServe is a large language model (LLM) serving system that achieves high throughput and strong online latency guarantees.<n>We show that ConServe delivers an average of 2.2$times$ higher throughput and reduces online serving tail latency by 2.9$times$ on average compared to state-of-the-art systems.
arXiv Detail & Related papers (2024-10-02T04:12:13Z) - Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT)
In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT.
We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z) - LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand [1.423958951481749]
This paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU)
We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU.
LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.
arXiv Detail & Related papers (2024-03-29T04:54:22Z) - Carbon Footprint Reduction for Sustainable Data Centers in Real-Time [2.794742330785396]
We propose a Data Center Carbon Footprint Reduction (DC-CFR) multi-agent Reinforcement Learning (MARL) framework to optimize data centers for the objectives of carbon footprint reduction, energy consumption, and energy cost.<n>The results show that the DC-CFR MARL agents effectively resolved the complex interdependencies in optimizing cooling, load shifting, and energy storage in real-time for various locations under real-world dynamic weather and grid carbon intensity conditions.
arXiv Detail & Related papers (2024-03-21T02:59:56Z) - On the Limitations of Carbon-Aware Temporal and Spatial Workload
Shifting in the Cloud [0.6642611154902529]
We conduct a detailed data-driven analysis to understand the benefits and limitations of carbon-aware scheduling for cloud workloads.
Our findings show that while limited workload shifting can reduce carbon emissions, the practical reductions are currently far from ideal.
arXiv Detail & Related papers (2023-06-10T18:39:49Z) - Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs [64.26714148634228]
congestion control (CC) algorithms become extremely difficult to design.
It is currently not possible to deploy AI models on network devices due to their limited computational capabilities.
We build a computationally-light solution based on a recent reinforcement learning CC algorithm.
arXiv Detail & Related papers (2022-07-05T20:42:24Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z) - HUNTER: AI based Holistic Resource Management for Sustainable Cloud
Computing [26.48962351761643]
We propose an artificial intelligence (AI) based holistic resource management technique for sustainable cloud computing called HUNTER.
The proposed model formulates the goal of optimizing energy efficiency in data centers as a multi-objective scheduling problem.
Experiments on simulated and physical cloud environments show that HUNTER outperforms state-of-the-art baselines in terms of energy consumption, SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54 and 3 percent respectively.
arXiv Detail & Related papers (2021-10-11T18:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.