Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
- URL: http://arxiv.org/abs/2511.17031v1
- Date: Fri, 21 Nov 2025 08:12:47 GMT
- Title: Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
- Authors: Aniketh Iyengar, Jiaqi Han, Boris Ruf, Vincent Grari, Marcin Detyniecki, Stefano Ermon,
- Abstract summary: We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs)<n>Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps.<n>Our results validate the compute-bound nature of diffusion inference and provide a foundation for sustainable AI deployment planning and carbon footprint estimation.
- Score: 50.21021246855702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapidly growing computational demands of diffusion models for image generation have raised significant concerns about energy consumption and environmental impact. While existing approaches to energy optimization focus on architectural improvements or hardware acceleration, there is a lack of principled methods to predict energy consumption across different model configurations and hardware setups. We propose an adaptation of Kaplan scaling laws to predict GPU energy consumption for diffusion models based on computational complexity (FLOPs). Our approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, with the hypothesis that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps. We conduct comprehensive experiments across four state-of-the-art diffusion models (Stable Diffusion 2, Stable Diffusion 3.5, Flux, and Qwen) on three GPU architectures (NVIDIA A100, A4000, A6000), spanning various inference configurations including resolution (256x256 to 1024x1024), precision (fp16/fp32), step counts (10-50), and classifier-free guidance settings. Our energy scaling law achieves high predictive accuracy within individual architectures (R-squared > 0.9) and exhibits strong cross-architecture generalization, maintaining high rank correlations across models and enabling reliable energy estimation for unseen model-hardware combinations. These results validate the compute-bound nature of diffusion inference and provide a foundation for sustainable AI deployment planning and carbon footprint estimation.
Related papers
- SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation [62.14510717860079]
We propose a Synergistic Diffusion-Autoregression paradigm that unifies the training efficiency of autoregressive models with the parallel inference capability of diffusion.<n>SDAR performs a lightweight paradigm conversion that transforms a well-trained autoregressive (AR) model into a blockwise diffusion model through brief, data-efficient adaptation.<n>Building on this insight, SDAR achieves efficient AR-to-diffusion conversion with minimal cost, preserving AR-level performance while enabling parallel generation.
arXiv Detail & Related papers (2025-10-07T17:29:28Z) - FoilDiff: A Hybrid Transformer Backbone for Diffusion-based Modelling of 2D Airfoil Flow Fields [1.5749416770494706]
FoilDiff is a diffusion-based surrogate model with a hybrid-backbone denoising network.<n>It can provide both more accurate predictions and better-calibrated predictive uncertainty than existing diffusion-based models.
arXiv Detail & Related papers (2025-10-05T19:10:38Z) - Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models [52.74448905289362]
EqM is a generative modeling framework built from an equilibrium dynamics perspective.<n>By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models.
arXiv Detail & Related papers (2025-10-02T17:59:06Z) - Distributed Cross-Channel Hierarchical Aggregation for Foundation Models [8.360214641005673]
We introduce theHierarchical Cross-Channel Aggregation (D-CHAG) approach for datasets with a large number of channels across image modalities.<n>Our method is compatible with any model-parallel strategy and any type of transformer architecture, significantly improving computational efficiency.<n>When integrated with tensor sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD on the Frontier Supercomputer.
arXiv Detail & Related papers (2025-06-26T15:58:14Z) - Controlled Latent Diffusion Models for 3D Porous Media Reconstruction [2.61662361742721]
Three-dimensional digital reconstruction of porous media presents a fundamental challenge in geoscience.<n>We introduce a computational framework that addresses this challenge through latent diffusion models operating within the EDM framework.<n>Our approach reduces dimensionality via a custom variational autoencoder trained in binary volumes, improving efficiency and also enabling the generation of larger volumes.
arXiv Detail & Related papers (2025-03-31T13:36:55Z) - EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models [12.931893842093718]
We present EXION, the first SW-HW co-designed diffusion accelerator.<n>It exploits the unique inter- and intra-iteration output sparsity in diffusion models.<n>It achieves dramatic improvements in performance and energy efficiency by 3.2-379.3x and 45.1-3067.6x compared to a server GPU and by 42.6-1090.9x and 196.9-4668.2x compared to an edge GPU.
arXiv Detail & Related papers (2025-01-10T03:07:28Z) - Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step.<n>Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z) - Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Memory-Efficient Fine-Tuning for Quantized Diffusion Model [12.875837358532422]
We introduce TuneQDM, a memory-efficient fine-tuning method for quantized diffusion models.
Our method consistently outperforms the baseline in both single-/multi-subject generations.
arXiv Detail & Related papers (2024-01-09T03:42:08Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.