Related papers: The Case for Learning Application Behavior to Improve Hardware Energy Efficiency

The Case for Learning Application Behavior to Improve Hardware Energy Efficiency

URL: http://arxiv.org/abs/2004.13074v2
Date: Mon, 23 Nov 2020 20:12:39 GMT
Title: The Case for Learning Application Behavior to Improve Hardware Energy Efficiency
Authors: Kevin Weston, Vahid Jafanza, Arnav Kansal, Abhishek Taur, Mohamed Zahran, Abdullah Muzahid
Abstract summary: We propose to use the harvested knowledge to tune hardware configurations. Our proposed approach, called FORECASTER, uses a deep learning model to learn what configuration of hardware resources provides the optimal energy efficiency for a certain behavior of an application. Our results show that FORECASTER can save as much as 18.4% system power over the baseline set up with all resources.
Score: 2.4425948078034847
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer applications are continuously evolving. However, significant knowledge can be harvested from a set of applications and applied in the context of unknown applications. In this paper, we propose to use the harvested knowledge to tune hardware configurations. The goal of such tuning is to maximize hardware efficiency (i.e., maximize an applications performance while minimizing the energy consumption). Our proposed approach, called FORECASTER, uses a deep learning model to learn what configuration of hardware resources provides the optimal energy efficiency for a certain behavior of an application. During the execution of an unseen application, the model uses the learned knowledge to reconfigure hardware resources in order to maximize energy efficiency. We have provided a detailed design and implementation of FORECASTER and compared its performance against a prior state-of-the-art hardware reconfiguration approach. Our results show that FORECASTER can save as much as 18.4% system power over the baseline set up with all resources. On average, FORECASTER saves 16% system power over the baseline setup while sacrificing less than 0.01% of overall performance. Compared to the prior scheme, FORECASTER increases power savings by 7%.

Related papers

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning [1.1380162891529537]
Machine learning has shown promise in determining optimal times to switch nodes on or off. In this study, we enhance the performance of a deep reinforcement learning (DRL) agent for HPC power management by integrating curriculum learning (CL) Experimental results confirm that an easy-to-hard curriculum outperforms other training orders in terms of reducing wasted energy usage.
arXiv Detail & Related papers (2025-02-27T18:19:22Z)
Secure Resource Allocation via Constrained Deep Reinforcement Learning [49.15061461220109]
We present SARMTO, a framework that balances resource allocation, task offloading, security, and performance. SARMTO consistently outperforms five baseline approaches, achieving up to a 40% reduction in system costs. These enhancements highlight SARMTO's potential to revolutionize resource management in intricate distributed computing environments.
arXiv Detail & Related papers (2025-01-20T15:52:43Z)
Energy consumption of code small language models serving with runtime engines and execution providers [11.998900897003997]
Small Language Models (SLMs) offer a promising solution to reduce resource demands. Our goal is to analyze the impact of deep learning engines and execution providers on energy consumption, execution time, and computing-resource utilization.
arXiv Detail & Related papers (2024-12-19T22:44:02Z)
Green AI: A Preliminary Empirical Study on Energy Consumption in DL Models Across Different Runtime Infrastructures [56.200335252600354]
It is common practice to deploy pre-trained models on environments distinct from their native development settings. This led to the introduction of interchange formats such as ONNX, which includes its infrastructure, and ONNX, which work as standard formats.
arXiv Detail & Related papers (2024-02-21T09:18:44Z)
EASRec: Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems [82.76483989905961]
Current Sequential Recommender Systems (SRSs) suffer from computational and resource inefficiencies. We develop the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec) EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network.
arXiv Detail & Related papers (2024-02-01T07:22:52Z)
A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes [0.46040036610482665]
We use Reinforcement Learning to design a power capping policy on cloud compute nodes. We show how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
arXiv Detail & Related papers (2023-08-15T23:25:52Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems [5.165692107696155]
Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. We study and analyze resource allocation solutions that can increase the on-time task completion rate while considering the energy constraint. We observed 8.9% improvement in on-time task completion rate and 12.6% in energy-saving without imposing any significant overhead on the edge system.
arXiv Detail & Related papers (2022-05-31T19:19:40Z)
U-Boost NAS: Utilization-Boosted Differentiable Neural Architecture Search [50.33956216274694]
optimizing resource utilization in target platforms is key to achieving high performance during DNN inference. We propose a novel hardware-aware NAS framework that does not only optimize for task accuracy and inference latency, but also for resource utilization. We achieve 2.8 - 4x speedup for DNN inference compared to prior hardware-aware NAS methods.
arXiv Detail & Related papers (2022-03-23T13:44:15Z)
Deep Reinforcement Learning Based Multidimensional Resource Management for Energy Harvesting Cognitive NOMA Communications [64.1076645382049]
Combination of energy harvesting (EH), cognitive radio (CR), and non-orthogonal multiple access (NOMA) is a promising solution to improve energy efficiency. In this paper, we study the spectrum, energy, and time resource management for deterministic-CR-NOMA IoT systems.
arXiv Detail & Related papers (2021-09-17T08:55:48Z)
Intelligent colocation of HPC workloads [0.0]
Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though the remaining resources may be underutilized. It is hard for developers and runtime systems to ensure that all critical resources are fully exploited by a single application, so an attractive technique is to colocate multiple applications on the same server. We show that server efficiency can be improved by first modeling the expected performance degradation of colocated applications based on measured hardware performance counters.
arXiv Detail & Related papers (2021-03-16T12:35:35Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Intelligent Resource Allocation in Dense LoRa Networks using Deep Reinforcement Learning [5.035252201462008]
We propose a multi-channel scheme for LoRaDRL. Results demonstrate that the proposed algorithm not only significantly improves long-range wide area network (LoRaWAN)'s packet delivery ratio (PDR) We show that LoRaDRL's output improves the performance of state-of-the-art techniques resulting in some cases an improvement of more than 500% in terms of PDR.
arXiv Detail & Related papers (2020-12-22T07:41:47Z)
AVAC: A Machine Learning based Adaptive RRAM Variability-Aware Controller for Edge Devices [3.7346292069282643]
We propose an Adaptive RRAM Variability-Aware Controller, AVAC, which periodically updates Wait Buffer and batch sizes. AVAC allows Edge devices to adapt to different applications and their stages, to improve performance and reduce energy consumption.
arXiv Detail & Related papers (2020-05-06T19:06:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.