Related papers: Where Do the Joules Go? Diagnosing Inference Energy Consumption

Where Do the Joules Go? Diagnosing Inference Energy Consumption

URL: http://arxiv.org/abs/2601.22076v2
Date: Fri, 30 Jan 2026 02:31:56 GMT
Title: Where Do the Joules Go? Diagnosing Inference Energy Consumption
Authors: Jae-Won Chung, Ruofan Wu, Jeff J. Ma, Mosharaf Chowdhury,
Abstract summary: We present a large-scale measurement study of inference time and energy across the generative AI landscape with 46 models, 7 tasks, and 1,858 different configurations.<n>Our empirical findings span order-of-magnitude variations: LLM task type can lead to 25$times$ energy differences, video generation sometimes consumes more than 100$times$ the energy of images, and GPU utilization differences can result in 3--5$times$ energy differences.
Score: 10.337349215328839
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Energy is now a critical ML computing resource. While measuring energy consumption and observing trends is a valuable first step, accurately understanding and diagnosing why those differences occur is crucial for optimization. To that end, we begin by presenting a large-scale measurement study of inference time and energy across the generative AI landscape with 46 models, 7 tasks, and 1,858 different configurations on NVIDIA H100 and B200 GPUs. Our empirical findings span order-of-magnitude variations: LLM task type can lead to 25$\times$ energy differences, video generation sometimes consumes more than 100$\times$ the energy of images, and GPU utilization differences can result in 3--5$\times$ energy differences. Based on our observations, we present a framework for reasoning about the underlying mechanisms that govern time and energy consumption. The essence is that time and energy are determined by latent metrics like memory and utilization, which are in turn affected by various factors across the algorithm, software, and hardware layers. Our framework also extends directly to throughput per watt, a critical metric for power-constrained datacenters.

Related papers

Towards Green AI: Decoding the Energy of LLM Inference in Software Development [46.879983975894135]
AI-assisted tools are increasingly integrated into software development, but their reliance on large language models (LLMs) introduces substantial computational and energy costs.<n>We conduct a phase-level analysis of LLM inference energy consumption, distinguishing between the (1) prefill, where the model processes the input and builds internal representations, and (2) decoding, where output tokens are generated using the stored state.
arXiv Detail & Related papers (2026-02-05T14:38:19Z)
Probabilistic energy profiler for statically typed JVM-based programming languages [1.7842332554022688]
Energy consumption is a growing concern in several fields, from mobile devices to large data centers.<n>Previous approaches have a broader focus, such as on specific functions or programs, rather than source code statements.<n>We develop a novel methodology to address the limitations of measuring only the consumption and using point estimates.
arXiv Detail & Related papers (2025-12-02T13:21:35Z)
Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation [50.21021246855702]
We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs)<n>Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps.<n>Our results validate the compute-bound nature of diffusion inference and provide a foundation for sustainable AI deployment planning and carbon footprint estimation.
arXiv Detail & Related papers (2025-11-21T08:12:47Z)
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization [24.32172951691564]
Energy remains a metric that is often overlooked, under-explored, or poorly understood in the context of building ML systems.<n>We present the ML$.$ENERGY Benchmark, a benchmark suite and tool for measuring inference energy consumption under realistic service environments.
arXiv Detail & Related papers (2025-05-09T18:27:32Z)
Unveiling the Energy Vampires: A Methodology for Debugging Software Energy Consumption [5.602876058122268]
This paper presents an energy debug methodology for identifying and isolating energy consumption hotspots in software systems.<n>Our analysis reveals significant energy consumption differences between Alpine and Ubuntu distributions.<n>By isolating and benchmarking memcpy, we confirm it as the primary cause of the energy discrepancy.
arXiv Detail & Related papers (2024-12-13T11:49:19Z)
Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology. This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit. We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models. We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z)
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference [19.439683873290623]
Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These models carry significant computational challenges, especially the compute and energy costs required for inference.
arXiv Detail & Related papers (2023-10-04T17:41:59Z)
Attention Mechanism with Energy-Friendly Operations [61.58748425876866]
We rethink attention mechanism from the energy consumption aspects. We build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model achieves competitable accuracy.
arXiv Detail & Related papers (2022-04-28T08:50:09Z)
Compute and Energy Consumption Trends in Deep Learning Inference [67.32875669386488]
We study relevant models in the areas of computer vision and natural language processing. For a sustained increase in performance we see a much softer growth in energy consumption than previously anticipated.
arXiv Detail & Related papers (2021-09-12T09:40:18Z)
Adversarial Energy Disaggregation for Non-intrusive Load Monitoring [78.47901044638525]
Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions. Recent advances reveal that deep neural networks (DNNs) can get favorable performance for NILM. We introduce the idea of adversarial learning into NILM, which is new for the energy disaggregation task.
arXiv Detail & Related papers (2021-08-02T03:56:35Z)
Towards Accurate and Reliable Energy Measurement of NLP Models [20.289537200662306]
We show that existing software-based energy measurements are not accurate because they do not take into account hardware differences and how resource utilization affects energy consumption. We quantify the error of existing software-based energy measurements by using a hardware power meter that provides highly accurate energy measurements. Our key takeaway is the need for a more accurate energy estimation model that takes into account hardware variabilities and the non-linear relationship between resource utilization and energy consumption.
arXiv Detail & Related papers (2020-10-11T13:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.