Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
- URL: http://arxiv.org/abs/2407.00010v1
- Date: Thu, 25 Apr 2024 11:24:08 GMT
- Title: Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
- Authors: Grant Wilkins, Srinivasan Keshav, Richard Mortier,
- Abstract summary: Training and using Large Language Models (LLMs) require large amounts of energy.
This paper addresses the challenge of reducing energy consumption in data centers running LLMs.
We propose a hybrid data center model that uses a cost-based scheduling framework to dynamically allocate tasks across hardware accelerators.
- Score: 0.2389598109913753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Both the training and use of Large Language Models (LLMs) require large amounts of energy. Their increasing popularity, therefore, raises critical concerns regarding the energy efficiency and sustainability of data centers that host them. This paper addresses the challenge of reducing energy consumption in data centers running LLMs. We propose a hybrid data center model that uses a cost-based scheduling framework to dynamically allocate LLM tasks across hardware accelerators that differ in their energy efficiencies and computational capabilities. Specifically, our workload-aware strategy determines whether tasks are processed on energy-efficient processors or high-performance GPUs based on the number of input and output tokens in a query. Our analysis of a representative LLM dataset, finds that this hybrid strategy can reduce CPU+GPU energy consumption by 7.5% compared to a workload-unaware baseline.
Related papers
- DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency [7.073435885680335]
We propose DynamoLLM, the first energy-management framework for generative large language models.
At a service-level, DynamoLLM conserves 53% energy and 38% operational carbon emissions, and reduces 61% cost to the customer.
arXiv Detail & Related papers (2024-08-01T17:40:45Z) - Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference [6.68507515624183]
Energy availability has come to the forefront as the biggest challenge for data center expansion to serve large language models.
We show that depending on the inputs, the model, and the service-level agreements, there are several knobs available to the LLM inference provider to use for being energy efficient.
arXiv Detail & Related papers (2024-03-29T17:22:48Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - From Words to Watts: Benchmarking the Energy Costs of Large Language
Model Inference [19.439683873290623]
Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art.
These models carry significant computational challenges, especially the compute and energy costs required for inference.
arXiv Detail & Related papers (2023-10-04T17:41:59Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - POLCA: Power Oversubscription in LLM Cloud Providers [0.8299593158757622]
Large language models (LLMs) are becoming increasingly power intensive.
We show that there is a significant opportunity to oversubscribe power in LLM clusters.
We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters.
arXiv Detail & Related papers (2023-08-24T16:32:34Z) - Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A
Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.
Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy.
We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z) - A Framework for Energy and Carbon Footprint Analysis of Distributed and
Federated Edge Learning [48.63610479916003]
This article breaks down and analyzes the main factors that influence the environmental footprint of distributed learning policies.
It models both vanilla and decentralized FL policies driven by consensus.
Results show that FL allows remarkable end-to-end energy savings (30%-40%) for wireless systems characterized by low bit/Joule efficiency.
arXiv Detail & Related papers (2021-03-18T16:04:42Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z) - Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable
Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied.
A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem.
Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.