Related papers: Dissecting Transformers: A CLEAR Perspective towards Green AI

Dissecting Transformers: A CLEAR Perspective towards Green AI

URL: http://arxiv.org/abs/2510.02810v1
Date: Fri, 03 Oct 2025 08:33:07 GMT
Title: Dissecting Transformers: A CLEAR Perspective towards Green AI
Authors: Hemang Jain, Shailender Goyal, Divyansh Pandey, Karthik Vaidhyanathan,
Abstract summary: We present the first fine-grained empirical analysis of inference energy across core components of transformer architecture.<n>We propose a novel methodology to overcome temporal mismatch between microsecond scale component execution and monitoring of millisecond (ms) scale energy sensors.<n>Our findings establish detailed component-level energy baselines and provide insight as an initial step to build energy-efficient transformer models.
Score: 0.6524460254566904
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid adoption of Large Language Models (LLMs) has raised significant environmental concerns. Unlike the one-time cost of training, LLM inference occurs continuously at a global scale and now dominates the AI energy footprint. Yet, most sustainability studies report only coarse, model-level metrics due to the lack of fine-grained measurement methods, treating energy efficiency more as an afterthought than as a primary objective. We present the first fine-grained empirical analysis of inference energy across core components of transformer architecture. We propose a novel methodology, Component-Level Energy Assessment via Repeated sampling (CLEAR), to overcome temporal mismatch between microsecond scale component execution and monitoring of millisecond (ms) scale energy sensors. Using CLEAR, we evaluate 15 models spanning four distinct architecture types and consistently keep component-wise energy variance below 9.5\% while capturing more than 90\% of the model's total energy as individual components. Our empirical analysis reveals that Attention blocks consume significantly more energy per floating-point operation (FLOP), indicating that energy consumption is not proportionally aligned with FLOP counts. This shows that FLOPs alone fail to capture the true energy cost at a component level. Our findings establish detailed component-level energy baselines and provide insight as an initial step to build energy-efficient transformer models through component-level optimizations.

Related papers

Towards Green AI: Decoding the Energy of LLM Inference in Software Development [46.879983975894135]
AI-assisted tools are increasingly integrated into software development, but their reliance on large language models (LLMs) introduces substantial computational and energy costs.<n>We conduct a phase-level analysis of LLM inference energy consumption, distinguishing between the (1) prefill, where the model processes the input and builds internal representations, and (2) decoding, where output tokens are generated using the stored state.
arXiv Detail & Related papers (2026-02-05T14:38:19Z)
Comparing energy consumption and accuracy in text classification inference [0.9208007322096533]
This study systematically evaluates the trade-offs between model accuracy and energy consumption in text classification inference.<n>The best-performing model in terms of accuracy can also be energy-efficient, while larger LLMs tend to consume significantly more energy with lower classification accuracy.
arXiv Detail & Related papers (2025-08-19T18:00:08Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations [2.2765705959685234]
This study investigates the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines.<n>We employ software-based power measurements to ensure ease of replication across diverse configurations, models, and datasets.
arXiv Detail & Related papers (2025-03-31T10:28:04Z)
Computing Within Limits: An Empirical Study of Energy Consumption in ML Training and Inference [2.553456266022126]
Machine learning (ML) has seen tremendous advancements, but its environmental footprint remains a concern. Acknowledging the growing environmental impact of ML this paper investigates Green ML.
arXiv Detail & Related papers (2024-06-20T13:59:34Z)
On Feature Diversity in Energy-based Models [98.78384185493624]
An energy-based model (EBM) is typically formed of inner-model(s) that learn a combination of the different features to generate an energy mapping for each input configuration. We extend the probably approximately correct (PAC) theory of EBMs and analyze the effect of redundancy reduction on the performance of EBMs.
arXiv Detail & Related papers (2023-06-02T12:30:42Z)
FemtoDet: An Object Detection Baseline for Energy Versus Performance Tradeoffs [27.006082622843653]
Vision applications of convolutional neural networks, such as always-on surveillance cameras, are critical for energy constraints. This paper aims to serve as a baseline by designing detectors to reach tradeoffs between energy and performance from two perspectives.
arXiv Detail & Related papers (2023-01-17T06:24:08Z)
Transformer-based Context Condensation for Boosting Feature Pyramids in Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results. In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z)
IrEne: Interpretable Energy Prediction for Transformers [15.677294441315535]
Existing software-based energy measurements of NLP models are not accurate because they do not consider the complex interactions between energy consumption and model execution. We present IrEne, an interpretable and energy prediction system that accurately predicts the inference energy consumption of a wide range of Transformer-based NLP models.
arXiv Detail & Related papers (2021-06-02T14:43:51Z)
Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models. We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)
A Framework for Energy and Carbon Footprint Analysis of Distributed and Federated Edge Learning [48.63610479916003]
This article breaks down and analyzes the main factors that influence the environmental footprint of distributed learning policies. It models both vanilla and decentralized FL policies driven by consensus. Results show that FL allows remarkable end-to-end energy savings (30%-40%) for wireless systems characterized by low bit/Joule efficiency.
arXiv Detail & Related papers (2021-03-18T16:04:42Z)
Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied. A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem. Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.