Related papers: Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks

Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks

URL: http://arxiv.org/abs/2505.15631v1
Date: Wed, 21 May 2025 15:16:41 GMT
Title: Guidelines for the Quality Assessment of Energy-Aware NAS Benchmarks
Authors: Nick Kocher, Christian Wassermann, Leona Hennig, Jonas Seng, Holger Hoos, Kristian Kersting, Marius Lindauer, Matthias Müller,
Abstract summary: Energy-aware benchmarking aims to make it possible for NAS to trade off model energy consumption against accuracy.<n>We analyse EA-HAS-Bench based on these principles and find that the choice of GPU measurement API has a large impact on the quality of results.
Score: 26.441107070248016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Architecture Search (NAS) accelerates progress in deep learning through systematic refinement of model architectures. The downside is increasingly large energy consumption during the search process. Surrogate-based benchmarking mitigates the cost of full training by querying a pre-trained surrogate to obtain an estimate for the quality of the model. Specifically, energy-aware benchmarking aims to make it possible for NAS to favourably trade off model energy consumption against accuracy. Towards this end, we propose three design principles for such energy-aware benchmarks: (i) reliable power measurements, (ii) a wide range of GPU usage, and (iii) holistic cost reporting. We analyse EA-HAS-Bench based on these principles and find that the choice of GPU measurement API has a large impact on the quality of results. Using the Nvidia System Management Interface (SMI) on top of its underlying library influences the sampling rate during the initial data collection, returning faulty low-power estimations. This results in poor correlation with accurate measurements obtained from an external power meter. With this study, we bring to attention several key considerations when performing energy-aware surrogate-based benchmarking and derive first guidelines that can help design novel benchmarks. We show a narrow usage range of the four GPUs attached to our device, ranging from 146 W to 305 W in a single-GPU setting, and narrowing down even further when using all four GPUs. To improve holistic energy reporting, we propose calibration experiments over assumptions made in popular tools, such as Code Carbon, thus achieving reductions in the maximum inaccuracy from 10.3 % to 8.9 % without and to 6.6 % with prior estimation of the expected load on the device.

Related papers

ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge [13.57054444887393]
We propose ECORE, a framework that integrates multiple dynamic routing strategies.<n>ECORE balances energy efficiency and detection performance based on object characteristics.<n>Results demonstrate that our proposed context-aware routing strategies can reduce energy consumption and latency by 45% and 49%, respectively.
arXiv Detail & Related papers (2025-07-08T14:16:14Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization [18.675499212393785]
Energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems.<n>We present the ML.ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments.
arXiv Detail & Related papers (2025-05-09T18:27:32Z)
A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process.<n>Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3.<n>By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z)
Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior.<n>We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z)
Unveiling Energy Efficiency in Deep Learning: Measurement, Prediction, and Scoring across Edge Devices [8.140572894424208]
We conduct a threefold study, including energy measurement, prediction, and efficiency scoring. Firstly, we present a detailed, first-of-its-kind measurement study that uncovers the energy consumption characteristics of on-device deep learning. Secondly, we design and implement the first kernel-level energy predictors for edge devices based on our kernel-level energy dataset.
arXiv Detail & Related papers (2023-10-19T23:55:00Z)
EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications [0.6144680854063939]
EnergyAnalyzer is a code-level static analysis tool for estimating the energy consumption of embedded software. It was developed as part of a larger project called TeamPlay, which aimed to provide a toolchain for developing embedded applications where energy properties are first-class citizens.
arXiv Detail & Related papers (2023-05-24T10:01:32Z)
MAPLE: Microprocessor A Priori for Latency Estimation [81.91509153539566]
Modern deep neural networks must demonstrate state-of-the-art accuracy while exhibiting low latency and energy consumption. Measuring the latency of every evaluated architecture adds a significant amount of time to the NAS process. We propose Microprocessor A Priori for Estimation Estimation MAPLE that does not rely on transfer learning or domain adaptation.
arXiv Detail & Related papers (2021-11-30T03:52:15Z)
An Ensemble Learning Approach for In-situ Monitoring of FPGA Dynamic Power [20.487660974785943]
We present and evaluate a power monitoring scheme capable of accurately estimating the runtime dynamic power of FPGAs. We describe a novel and specialized ensemble model which can be decomposed into multiple customized base learners. In experiments, we first show that a single decision tree model can achieve prediction error within 4.51% of a commercial gate-level power estimation tool.
arXiv Detail & Related papers (2020-09-03T03:39:14Z)
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy [49.3037538647714]
We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. With the same accuracy, APQ reduces the latency/energy by 2x/1.3x over MobileNetV2+HAQ.
arXiv Detail & Related papers (2020-06-15T16:09:17Z)
Benchmarking Graph Neural Networks [75.42159546060509]
Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. For any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. GitHub repository has reached 1,800 stars and 339 forks, which demonstrates the utility of the proposed open-source framework.
arXiv Detail & Related papers (2020-03-02T15:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.