Green LLM Techniques in Action: How Effective Are Existing Techniques for Improving the Energy Efficiency of LLM-Based Applications in Industry?
- URL: http://arxiv.org/abs/2601.02512v1
- Date: Mon, 05 Jan 2026 19:35:29 GMT
- Title: Green LLM Techniques in Action: How Effective Are Existing Techniques for Improving the Energy Efficiency of LLM-Based Applications in Industry?
- Authors: Pelin Rabia Kuran, Rumbidzai Chitakunye, Vincenzo Stoico, Ilja Heitlager, Justus Bogner,
- Abstract summary: The rapid adoption of large language models (LLMs) has raised concerns about their substantial energy consumption.<n>We analyzed an application in an industrial context at Schuberg Philis, a Dutch IT services company.<n>Several techniques, such as Prompt Optimization and 2-bit Quantization, managed to reduce energy use significantly, sometimes by up to 90%.<n>The only technique that achieved significant and strong energy reductions without harming the other qualities substantially was Small and Large Model Collaboration via Nvidia's Prompt Task and Complexity.
- Score: 2.3683790724077864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid adoption of large language models (LLMs) has raised concerns about their substantial energy consumption, especially when deployed at industry scale. While several techniques have been proposed to address this, limited empirical evidence exists regarding the effectiveness of applying them to LLM-based industry applications. To fill this gap, we analyzed a chatbot application in an industrial context at Schuberg Philis, a Dutch IT services company. We then selected four techniques, namely Small and Large Model Collaboration, Prompt Optimization, Quantization, and Batching, applied them to the application in eight variations, and then conducted experiments to study their impact on energy consumption, accuracy, and response time compared to the unoptimized baseline. Our results show that several techniques, such as Prompt Optimization and 2-bit Quantization, managed to reduce energy use significantly, sometimes by up to 90%. However, these techniques especially impacted accuracy negatively, to a degree that is not acceptable in practice. The only technique that achieved significant and strong energy reductions without harming the other qualities substantially was Small and Large Model Collaboration via Nvidia's Prompt Task and Complexity Classifier (NPCC) with prompt complexity thresholds. This highlights that reducing the energy consumption of LLM-based applications is not difficult in practice. However, improving their energy efficiency, i.e., reducing energy use without harming other qualities, remains challenging. Our study provides practical insights to move towards this goal.
Related papers
- Optimising for Energy Efficiency and Performance in Machine Learning [3.8803432012641395]
We show that Energy Consumption Optimiser (ECOpt) optimises for energy efficiency and model performance.<n>ECOpt quantifies the trade-off between these metrics as an interpretable frontier.<n>We show that ECOpt can have a net positive environmental impact and use it to uncover seven models for CIFAR-10 that improve upon the state of the art.
arXiv Detail & Related papers (2026-01-13T21:28:58Z) - How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices [81.85465545346266]
Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm.<n>Yet, current open-source DLMs often underperform their AR counterparts in speed, limiting their real-world utility.<n>This work presents a systematic study of DLM efficiency, identifying key issues in prior evaluation methods.
arXiv Detail & Related papers (2025-10-21T10:00:32Z) - Energy-Driven Steering: Reducing False Refusals in Large Language Models [80.09252175869858]
We introduce Energy-Driven Steering (EDS), a novel, fine-tuning free framework designed to resolve this challenge through dynamic, inference-time intervention.<n>We trained a lightweight, external Energy-Based Model (EBM) to assign high energy to undesirable (false refusal or jailbreak) states and low energy to desirable (helpful response or safe reject) ones.<n>We use the gradient of the energy function to dynamically steer the LLM's hidden states to low energy regions, correcting the model to generate a desirable response in real-time without modifying its weights.
arXiv Detail & Related papers (2025-10-09T06:01:41Z) - Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights [2.1249213103048414]
The rapid adoption of large language models (LLMs) has led to significant energy consumption and carbon emissions.<n>This paper explores the integration of energy-efficient optimization techniques in the deployment of LLMs to address these concerns.
arXiv Detail & Related papers (2025-04-07T21:56:59Z) - Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency [6.306413686006502]
We conduct a comprehensive analysis of 28 quantized Large Language Models (LLMs) from the Ollama library.<n>We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types.<n>Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings.
arXiv Detail & Related papers (2025-04-04T11:29:30Z) - Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations [45.243401722182554]
Large language models (LLMs) claim to assist developers in optimizing code for performance and energy efficiency.<n>This work focuses on software written in Matlab-widely used in both academia and industry for scientific and engineering applications.<n>We analyze energy-focused optimization on 400 scripts across 100 top GitHub repositories.
arXiv Detail & Related papers (2025-03-26T00:27:29Z) - On the Effectiveness of Microservices Tactics and Patterns to Reduce Energy Consumption: An Experimental Study on Trade-Offs [3.928499292698212]
Microservice-based systems have established themselves in the software industry.<n> sustainability-related legislation and the growing costs of energy-hungry software increase the importance of energy efficiency for these systems.<n>While some proposals for architectural tactics and patterns exist, their effectiveness as well as potential trade-offs on other quality attributes (QAs) remain unclear.
arXiv Detail & Related papers (2025-01-24T11:15:23Z) - Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings [1.781045155774463]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing (NLP) tasks.<n>However, their inference workloads are computationally and energy intensive, raising concerns about sustainability and environmental impact.
arXiv Detail & Related papers (2025-01-14T16:02:33Z) - Prompt engineering and its implications on the energy consumption of Large Language Models [4.791072577881446]
Large language models (LLMs) in software engineering pose severe challenges regarding computational resources, data centers, and carbon emissions.<n>In this paper, we investigate how prompt engineering techniques (PETs) can impact the carbon emission of the Llama 3 model for the code generation task.
arXiv Detail & Related papers (2025-01-10T11:49:31Z) - Impact of ML Optimization Tactics on Greener Pre-Trained ML Models [46.78148962732881]
This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations.
We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification.
Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
arXiv Detail & Related papers (2024-09-19T16:23:03Z) - One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models [42.95555008229016]
We propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining.
The advantages of the proposed method exhibit even more when the sparsity is extremely high.
arXiv Detail & Related papers (2023-10-14T05:43:09Z) - Towards Green AI in Fine-tuning Large Language Models via Adaptive
Backpropagation [58.550710456745726]
Fine-tuning is the most effective way of adapting pre-trained large language models (LLMs) to downstream applications.
Existing techniques on efficient fine-tuning can only achieve limited reduction of such FLOPs.
We present GreenTrainer, a new technique that adaptively evaluates different tensors' backpropagation costs and contributions to the fine-tuned model accuracy.
arXiv Detail & Related papers (2023-09-22T21:55:18Z) - Adversarial Energy Disaggregation for Non-intrusive Load Monitoring [78.47901044638525]
Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions.
Recent advances reveal that deep neural networks (DNNs) can get favorable performance for NILM.
We introduce the idea of adversarial learning into NILM, which is new for the energy disaggregation task.
arXiv Detail & Related papers (2021-08-02T03:56:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.