Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
- URL: http://arxiv.org/abs/2411.18873v1
- Date: Thu, 28 Nov 2024 02:51:54 GMT
- Title: Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
- Authors: Yijia Zhang, Zhihong Gou, Shijie Cao, Weigang Feng, Sicheng Zhang, Guohao Dai, Ningyi Xu,
- Abstract summary: We propose a novel search-based compilation method to generate energy-efficient GPU kernels.
Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption.
- Score: 5.03421342195771
- License:
- Abstract: Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energy-efficient GPU kernels by incorporating energy efficiency into the search process. To accelerate the energy evaluation process, we develop an accurate energy cost model based on high-level kernel features. Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption while maintaining low latency.
Related papers
- Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach [15.28157695259566]
Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures.
This paper studies a novel and practical online energy optimization problem for GPU in HPC scenarios.
EnergyUCB is designed to dynamically adjust GPU core frequencies in real-time, reducing energy consumption with minimal impact on performance.
arXiv Detail & Related papers (2024-10-03T17:05:34Z) - Energy-Efficient Spiking Recurrent Neural Network for Gesture Recognition on Embedded GPUs [1.37621344207686]
This research explores the deployment of a spiking recurrent neural network (SRNN) with liquid time constant neurons for gesture recognition.
We focus on the energy efficiency and computational efficacy of NVIDIA Jetson Nano embedded GPU platforms.
arXiv Detail & Related papers (2024-08-23T10:50:29Z) - PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices [10.01838504586422]
The continuous operation of ML-powered systems leads to significant energy use during inference.
This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, affects energy consumption for NN inference with regular fine-tuning.
We propose PolyThrottle, a solution that optimize configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner.
arXiv Detail & Related papers (2023-10-30T20:19:41Z) - Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A
Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.
Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy.
We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z) - FemtoDet: An Object Detection Baseline for Energy Versus Performance
Tradeoffs [27.006082622843653]
Vision applications of convolutional neural networks, such as always-on surveillance cameras, are critical for energy constraints.
This paper aims to serve as a baseline by designing detectors to reach tradeoffs between energy and performance from two perspectives.
arXiv Detail & Related papers (2023-01-17T06:24:08Z) - Great Power, Great Responsibility: Recommendations for Reducing Energy
for Training Language Models [8.927248087602942]
We investigate techniques that can be used to reduce the energy consumption of common NLP applications.
These techniques can lead to significant reduction in energy consumption when training language models or their use for inference.
arXiv Detail & Related papers (2022-05-19T16:03:55Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - BottleFit: Learning Compressed Representations in Deep Neural Networks
for Effective and Efficient Split Computing [48.11023234245863]
We propose a new framework called BottleFit, which includes a novel training strategy to achieve high accuracy even with strong compression rates.
BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset.
We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading.
arXiv Detail & Related papers (2022-01-07T22:08:07Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - The Architectural Implications of Distributed Reinforcement Learning on
CPU-GPU Systems [45.479582612113205]
We show how to improve the performance and power efficiency of RL training on CPU-GPU systems.
We quantify the overall hardware utilization on a state-of-the-art distributed RL training framework.
We also introduce a new system design metric, CPU/GPU ratio, and show how to find the optimal balance between CPU and GPU resources.
arXiv Detail & Related papers (2020-12-08T04:50:05Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.