Related papers: Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach

URL: http://arxiv.org/abs/2410.11855v1
Date: Thu, 03 Oct 2024 17:05:34 GMT
Title: Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach
Authors: Xiongxiao Xu, Solomon Abera Bekele, Brice Videau, Kai Shu,
Abstract summary: Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures. This paper studies a novel and practical online energy optimization problem for GPU in HPC scenarios. EnergyUCB is designed to dynamically adjust GPU core frequencies in real-time, reducing energy consumption with minimal impact on performance.
Score: 15.28157695259566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures, from small wearable devices to large-scale leadership computing facilities. The predominant methods in energy management optimization are focused on CPUs. However, GPUs are increasingly significant and account for the majority of energy consumption in heterogeneous high performance computing (HPC) systems. Moreover, they typically rely on either purely offline training or a hybrid of offline and online training, which are impractical and lead to energy loss during data collection. Therefore, this paper studies a novel and practical online energy optimization problem for GPUs in HPC scenarios. The problem is challenging due to the inherent performance-energy trade-offs of GPUs, the exploration & exploitation dilemma across frequencies, and the lack of explicit performance counters in GPUs. To address these challenges, we formulate the online energy consumption optimization problem as a multi-armed bandit framework and develop a novel bandit based framework EnergyUCB. EnergyUCB is designed to dynamically adjust GPU core frequencies in real-time, reducing energy consumption with minimal impact on performance. Specifically, the proposed framework EnergyUCB (1) balances the performance-energy trade-off in the reward function, (2) effectively navigates the exploration & exploitation dilemma when adjusting GPU core frequencies online, and (3) leverages the ratio of GPU core utilization to uncore utilization as a real-time GPU performance metric. Experiments on a wide range of real-world HPC benchmarks demonstrate that EnergyUCB can achieve substantial energy savings. The code of EnergyUCB is available at https://github.com/XiongxiaoXu/EnergyUCB-Bandit.

Related papers

Power- and Fragmentation-aware Online Scheduling for GPU Datacenters [9.29180785233729]
We focus on two objectives: minimizing GPU fragmentation and reducing power consumption. To this end, we propose PWR, a novel scheduling policy to minimize power usage by selecting power-efficient GPU and CPU combinations. We show how PWR, when combined with FGD, achieves a balanced trade-off between reducing power consumption and minimizing GPU fragmentation.
arXiv Detail & Related papers (2024-12-23T11:27:17Z)
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach [5.03421342195771]
We propose a novel search-based compilation method to generate energy-efficient GPU kernels. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption.
arXiv Detail & Related papers (2024-11-28T02:51:54Z)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs [55.95879347182669]
MoE architecture is renowned for its ability to increase model capacity without a proportional increase in inference cost. MoE-Lightning introduces a novel CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high resource utilization. MoE-Lightning can achieve up to 10.3x higher throughput than state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a single T4 GPU (16GB)
arXiv Detail & Related papers (2024-11-18T01:06:12Z)
Learning Iterative Reasoning through Energy Diffusion [90.24765095498392]
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks. IRED learns energy functions to represent the constraints between input conditions and desired outputs. We show IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks.
arXiv Detail & Related papers (2024-06-17T03:36:47Z)
Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale [20.30679358575365]
Recent large language models require considerable resources to train and deploy. With the right amount of power-capping, we show significant decreases in both temperature and power draw. Our work is the first to conduct and make available a detailed analysis of the effects of GPU power-capping at the supercomputing scale.
arXiv Detail & Related papers (2024-02-25T02:22:34Z)
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices [10.01838504586422]
The continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimize configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner.
arXiv Detail & Related papers (2023-10-30T20:19:41Z)
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU. This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z)
Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption. Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy. We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z)
Precise Energy Consumption Measurements of Heterogeneous Artificial Intelligence Workloads [0.534434568021034]
We present measurements of the energy consumption of two typical applications of deep learning models on different types of compute nodes. One advantage of our approach is that the information on energy consumption is available to all users of the supercomputer.
arXiv Detail & Related papers (2022-12-03T21:40:55Z)
Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models [8.927248087602942]
We investigate techniques that can be used to reduce the energy consumption of common NLP applications. These techniques can lead to significant reduction in energy consumption when training language models or their use for inference.
arXiv Detail & Related papers (2022-05-19T16:03:55Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems [45.479582612113205]
We show how to improve the performance and power efficiency of RL training on CPU-GPU systems. We quantify the overall hardware utilization on a state-of-the-art distributed RL training framework. We also introduce a new system design metric, CPU/GPU ratio, and show how to find the optimal balance between CPU and GPU resources.
arXiv Detail & Related papers (2020-12-08T04:50:05Z)
Faster than FAST: GPU-Accelerated Frontend for High-Speed VIO [46.20949184826173]
This work focuses on the applicability of efficient low-level, GPU hardware-specific instructions to improve on existing computer vision algorithms. Especially non-maxima suppression and the subsequent feature selection are prominent contributors to the overall image processing latency.
arXiv Detail & Related papers (2020-03-30T14:16:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.