Related papers: Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach

URL: http://arxiv.org/abs/2411.18873v1
Date: Thu, 28 Nov 2024 02:51:54 GMT
Title: Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
Authors: Yijia Zhang, Zhihong Gou, Shijie Cao, Weigang Feng, Sicheng Zhang, Guohao Dai, Ningyi Xu,
Abstract summary: We propose a novel search-based compilation method to generate energy-efficient GPU kernels.<n>Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption.
Score: 5.03421342195771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) have revolutionized various fields, but their deployment on GPUs often leads to significant energy consumption. Unlike existing methods for reducing GPU energy consumption, which are either hardware-inflexible or limited by workload constraints, this paper addresses the problem at the GPU kernel level. We propose a novel search-based compilation method to generate energy-efficient GPU kernels by incorporating energy efficiency into the search process. To accelerate the energy evaluation process, we develop an accurate energy cost model based on high-level kernel features. Furthermore, we introduce a dynamic updating strategy for the energy cost model, reducing the need for on-device energy measurements and accelerating the search process. Our evaluation demonstrates that the proposed approach can generate GPU kernels with up to 21.69% reduced energy consumption while maintaining low latency.

Related papers

Online Energy Optimization in GPUs: A Multi-Armed Bandit Approach [15.28157695259566]
Energy consumption has become a critical design metric and a limiting factor in the development of future computing architectures. This paper studies a novel and practical online energy optimization problem for GPU in HPC scenarios. EnergyUCB is designed to dynamically adjust GPU core frequencies in real-time, reducing energy consumption with minimal impact on performance.
arXiv Detail & Related papers (2024-10-03T17:05:34Z)
HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices [44.99833362998488]
The present work proposes a generic hardware architecture ready to be implemented on FPGA devices. The inference speed of the design is evaluated over different resource constrained FPGA devices. We demonstrate that our hardware-aware pruning algorithm achieves a remarkable improvement of a 45 % in inference time compared to a network pruned using the standard algorithm.
arXiv Detail & Related papers (2024-08-26T07:27:12Z)
Energy-Efficient Spiking Recurrent Neural Network for Gesture Recognition on Embedded GPUs [1.37621344207686]
This research explores the deployment of a spiking recurrent neural network (SRNN) with liquid time constant neurons for gesture recognition. We focus on the energy efficiency and computational efficacy of NVIDIA Jetson Nano embedded GPU platforms.
arXiv Detail & Related papers (2024-08-23T10:50:29Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices [10.01838504586422]
The continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimize configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner.
arXiv Detail & Related papers (2023-10-30T20:19:41Z)
Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption. Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy. We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z)
FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs [27.006082622843653]
Vision applications of convolutional neural networks, such as always-on surveillance cameras, are critical for energy constraints. This paper aims to serve as a baseline by designing detectors to reach tradeoffs between energy and performance from two perspectives.
arXiv Detail & Related papers (2023-01-17T06:24:08Z)
Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models [8.927248087602942]
We investigate techniques that can be used to reduce the energy consumption of common NLP applications. These techniques can lead to significant reduction in energy consumption when training language models or their use for inference.
arXiv Detail & Related papers (2022-05-19T16:03:55Z)
BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing [48.11023234245863]
We propose a new framework called BottleFit, which includes a novel training strategy to achieve high accuracy even with strong compression rates. BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset. We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading.
arXiv Detail & Related papers (2022-01-07T22:08:07Z)
AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet) The whole AdderNet can practically achieve 16% enhancement in speed. We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z)
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems [45.479582612113205]
We show how to improve the performance and power efficiency of RL training on CPU-GPU systems. We quantify the overall hardware utilization on a state-of-the-art distributed RL training framework. We also introduce a new system design metric, CPU/GPU ratio, and show how to find the optimal balance between CPU and GPU resources.
arXiv Detail & Related papers (2020-12-08T04:50:05Z)
Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network. We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.