The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
- URL: http://arxiv.org/abs/2505.06371v2
- Date: Thu, 16 Oct 2025 17:51:15 GMT
- Title: The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
- Authors: Jae-Won Chung, Jeff J. Ma, Ruofan Wu, Jiachen Liu, Oh Jun Kweon, Yuxuan Xia, Zhiyu Wu, Mosharaf Chowdhury,
- Abstract summary: Energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems.<n>We present the ML$.$ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments.
- Score: 24.32172951691564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the adoption of Generative AI in real-world services grow explosively, energy has emerged as a critical bottleneck resource. However, energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems. We present the ML$.$ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments, and the corresponding ML$.$ENERGY Leaderboard, which have served as a valuable resource for those hoping to understand and optimize the energy consumption of their generative AI services. In this paper, we explain four key design principles for benchmarking ML energy we have acquired over time, and then describe how they are implemented in the ML$.$ENERGY Benchmark. We then highlight results from the early 2025 iteration of the benchmark, including energy measurements of 40 widely used model architectures across 6 different tasks, case studies of how ML design choices impact energy consumption, and how automated optimization recommendations can lead to significant (sometimes more than 40%) energy savings without changing what is being computed by the model. The ML$.$ENERGY Benchmark is open-source and can be easily extended to various customized models and application scenarios.
Related papers
- Towards Green AI: Decoding the Energy of LLM Inference in Software Development [46.879983975894135]
AI-assisted tools are increasingly integrated into software development, but their reliance on large language models (LLMs) introduces substantial computational and energy costs.<n>We conduct a phase-level analysis of LLM inference energy consumption, distinguishing between the (1) prefill, where the model processes the input and builds internal representations, and (2) decoding, where output tokens are generated using the stored state.
arXiv Detail & Related papers (2026-02-05T14:38:19Z) - Understanding Efficiency: Quantization, Batching, and Serving Strategies in LLM Energy Use [4.513690948889834]
Large Language Models (LLMs) are increasingly deployed in production, contributing towards shifting the burden in terms of computational resources and energy demands from training to inference.<n>We show how emphsystem-level design choices can lead to orders-of-magnitude differences in energy consumption for the same model.<n>Our findings motivate phase-aware energy profiling and system-level optimizations for greener AI services.
arXiv Detail & Related papers (2026-01-29T22:16:25Z) - Where Do the Joules Go? Diagnosing Inference Energy Consumption [10.337349215328839]
We present a large-scale measurement study of inference time and energy across the generative AI landscape with 46 models, 7 tasks, and 1,858 different configurations.<n>Our empirical findings span order-of-magnitude variations: LLM task type can lead to 25$times$ energy differences, video generation sometimes consumes more than 100$times$ the energy of images, and GPU utilization differences can result in 3--5$times$ energy differences.
arXiv Detail & Related papers (2026-01-29T18:16:45Z) - Optimising for Energy Efficiency and Performance in Machine Learning [3.8803432012641395]
We show that Energy Consumption Optimiser (ECOpt) optimises for energy efficiency and model performance.<n>ECOpt quantifies the trade-off between these metrics as an interpretable frontier.<n>We show that ECOpt can have a net positive environmental impact and use it to uncover seven models for CIFAR-10 that improve upon the state of the art.
arXiv Detail & Related papers (2026-01-13T21:28:58Z) - EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z) - Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency [6.306413686006502]
We conduct a comprehensive analysis of 28 quantized Large Language Models (LLMs) from the Ollama library.<n>We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types.<n>Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings.
arXiv Detail & Related papers (2025-04-04T11:29:30Z) - Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations [2.2765705959685234]
This study investigates the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines.<n>We employ software-based power measurements to ensure ease of replication across diverse configurations, models, and datasets.
arXiv Detail & Related papers (2025-03-31T10:28:04Z) - MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI [5.50579824344998]
Machine learning (ML) technologies have led to a surge in power consumption across diverse systems.<n>This paper introduceserf Power, a comprehensive benchmarking methodology to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts.
arXiv Detail & Related papers (2024-10-15T20:06:33Z) - Impact of ML Optimization Tactics on Greener Pre-Trained ML Models [46.78148962732881]
This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations.
We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification.
Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
arXiv Detail & Related papers (2024-09-19T16:23:03Z) - Normalizing Energy Consumption for Hardware-Independent Evaluation [9.658615045493734]
We present a novel methodology for normalizing energy consumption across different hardware platforms.
Our approach shows that the number of reference points, the type of regression and the inclusion of computational metrics significantly influences the normalization process.
arXiv Detail & Related papers (2024-09-09T13:38:00Z) - Computing Within Limits: An Empirical Study of Energy Consumption in ML Training and Inference [2.553456266022126]
Machine learning (ML) has seen tremendous advancements, but its environmental footprint remains a concern.
Acknowledging the growing environmental impact of ML this paper investigates Green ML.
arXiv Detail & Related papers (2024-06-20T13:59:34Z) - Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology.
This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit.
We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models.
We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z) - LEAF + AIO: Edge-Assisted Energy-Aware Object Detection for Mobile
Augmented Reality [77.00418462388525]
Mobile augmented reality (MAR) applications are significantly energy-guzzling.
We design an edge-based energy-aware MAR system that enables MAR devices to dynamically change their configurations.
Our proposed dynamic MAR configuration adaptations can minimize the per frame energy consumption of multiple MAR clients.
arXiv Detail & Related papers (2022-05-27T06:11:50Z) - Automated Machine Learning: A Case Study on Non-Intrusive Appliance Load Monitoring [81.06807079998117]
We propose a novel approach to enable Automated Machine Learning (AutoML) for Non-Intrusive Appliance Load Monitoring (NIALM)<n>NIALM offers a cost-effective alternative to smart meters for measuring the energy consumption of electric devices and appliances.
arXiv Detail & Related papers (2022-03-06T10:12:56Z) - Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable
Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied.
A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem.
Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.