Carbon- and Precedence-Aware Scheduling for Data Processing Clusters
- URL: http://arxiv.org/abs/2502.09717v1
- Date: Thu, 13 Feb 2025 19:06:10 GMT
- Title: Carbon- and Precedence-Aware Scheduling for Data Processing Clusters
- Authors: Adam Lechowicz, Rohan Shenoy, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Christina Delimitrou,
- Abstract summary: We show that carbon-aware scheduling for data processing benefits from knowledge of both time-varying carbon and precedence constraints.
Our schedulers enable a priority between carbon reduction and job completion time, and we give analytical results characterizing the trade-off between the two.
Our Spark prototype on a 100-node cluster shows that a moderate configuration of $texttPCAPS$ reduces carbon footprint up to 32.9% without significantly impacting the cluster's total efficiency.
- Score: 10.676357280358886
- License:
- Abstract: As large-scale data processing workloads continue to grow, their carbon footprint raises concerns. Prior research on carbon-aware schedulers has focused on shifting computation to align with availability of low-carbon energy, but these approaches assume that each task can be executed independently. In contrast, data processing jobs have precedence constraints (i.e., outputs of one task are inputs for another) that complicate decisions, since delaying an upstream ``bottleneck'' task to a low-carbon period will also block downstream tasks, impacting the entire job's completion time. In this paper, we show that carbon-aware scheduling for data processing benefits from knowledge of both time-varying carbon and precedence constraints. Our main contribution is $\texttt{PCAPS}$, a carbon-aware scheduler that interfaces with modern ML scheduling policies to explicitly consider the precedence-driven importance of each task in addition to carbon. To illustrate the gains due to fine-grained task information, we also study $\texttt{CAP}$, a wrapper for any carbon-agnostic scheduler that adapts the key provisioning ideas of $\texttt{PCAPS}$. Our schedulers enable a configurable priority between carbon reduction and job completion time, and we give analytical results characterizing the trade-off between the two. Furthermore, our Spark prototype on a 100-node Kubernetes cluster shows that a moderate configuration of $\texttt{PCAPS}$ reduces carbon footprint by up to 32.9% without significantly impacting the cluster's total efficiency.
Related papers
- The Sunk Carbon Fallacy: Rethinking Carbon Footprint Metrics for Effective Carbon-Aware Scheduling [2.562727244613512]
We evaluate carbon-aware job scheduling and placement on a given set of servers for a number of carbon accounting metrics.
We study the factors that affect the added carbon cost of such suboptimal decision-making.
arXiv Detail & Related papers (2024-10-19T12:23:59Z) - Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning [49.91297276176978]
We propose a novel.
Efficient Fine-Tuning (PEFT) method for point cloud, called Point GST.
Point GST freezes the pre-trained model and introduces a trainable Point Cloud Spectral Adapter (PCSA) to finetune parameters in the spectral domain.
Extensive experiments on challenging point cloud datasets demonstrate that Point GST not only outperforms its fully finetuning counterpart but also significantly reduces trainable parameters.
arXiv Detail & Related papers (2024-10-10T17:00:04Z) - CarbonClipper: Optimal Algorithms for Carbon-Aware Spatiotemporal Workload Management [11.029788598491077]
carbon-aware workload management seeks to address the growing environmental impact of data centers.
mathsfSOAD$ formalizes the open problem of combining general metrics and deadline constraints in the online algorithms.
rm Cscriptsize ARCscriptsize LIPPER$ is a learning-augmented algorithm that takes advantage predictions.
arXiv Detail & Related papers (2024-08-14T22:08:06Z) - Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT)
In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT.
We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z) - Continual Learning on a Diet: Learning from Sparsely Labeled Streams Under Constrained Computation [123.4883806344334]
We study a realistic Continual Learning setting where learning algorithms are granted a restricted computational budget per time step while training.
We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates.
Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
arXiv Detail & Related papers (2024-04-19T10:10:39Z) - LACS: Learning-Augmented Algorithms for Carbon-Aware Resource Scaling with Uncertain Demand [1.423958951481749]
This paper studies the online carbon-aware resource scaling problem with unknown job lengths (OCSU)
We propose LACS, a theoretically robust learning-augmented algorithm that solves OCSU.
LACS achieves a 32% reduction in carbon footprint compared to the deadline-aware carbon-agnostic execution of the job.
arXiv Detail & Related papers (2024-03-29T04:54:22Z) - GreenCourier: Carbon-Aware Scheduling for Serverless Functions [1.2092241176897844]
GreenCourier is a scheduling framework that enables the runtime scheduling of serverless functions across geographically distributed regions based on their carbon efficiencies.
Results from our experiments show that compared to other approaches, GreenCourier reduces carbon emissions per function invocation by an average of 13.25%.
arXiv Detail & Related papers (2023-10-31T11:35:50Z) - On the Limitations of Carbon-Aware Temporal and Spatial Workload
Shifting in the Cloud [0.6642611154902529]
We conduct a detailed data-driven analysis to understand the benefits and limitations of carbon-aware scheduling for cloud workloads.
Our findings show that while limited workload shifting can reduce carbon emissions, the practical reductions are currently far from ideal.
arXiv Detail & Related papers (2023-06-10T18:39:49Z) - Machine Guided Discovery of Novel Carbon Capture Solvents [48.7576911714538]
Machine learning offers a promising method for reducing the time and resource burdens of materials development.
We have developed an end-to-end "discovery cycle" to select new aqueous amines compatible with the commercially viable acid gas scrubbing carbon capture.
The prediction process shows 60% accuracy against experiment for both material parameters and 80% for a single parameter on an external test set.
arXiv Detail & Related papers (2023-03-24T18:32:38Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.