Related papers: Towards Sustainable Large Language Model Serving

Towards Sustainable Large Language Model Serving

URL: http://arxiv.org/abs/2501.01990v1
Date: Tue, 31 Dec 2024 03:18:10 GMT
Title: Towards Sustainable Large Language Model Serving
Authors: Sophia Nguyen, Beihao Zhou, Yi Ding, Sihang Liu,
Abstract summary: We study LLMs from a carbon emission perspective, addressing both operational and embodied emissions.<n>We characterize the performance and energy of LLaMA with 1B, 3B, and 7B parameters using two Nvidia GPU types.<n>We analytically model operational carbon emissions based on energy consumption and carbon intensities from three grid regions.
Score: 3.085867867565808
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we study LLMs from a carbon emission perspective, addressing both operational and embodied emissions, and paving the way for sustainable LLM serving. We characterize the performance and energy of LLaMA with 1B, 3B, and 7B parameters using two Nvidia GPU types, a latest-generation RTX6000 Ada and an older-generation T4. We analytically model operational carbon emissions based on energy consumption and carbon intensities from three grid regions -- each representing a different energy source mix, and embodied carbon emissions based on chip area and memory size. Our characterization and modeling provide us with an in-depth understanding of the performance, energy, and carbon emissions of LLM serving. Our findings highlight the potential for optimizing sustainable LLM serving systems by considering both operational and embodied carbon emissions simultaneously.

Related papers

Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization [48.70916202664808]
DiffCarl is a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems.<n>It outperforms classic algorithms and state-of-the-art DRL solutions, with 2.3-30.1% lower operational cost.<n>It also achieves 28.7% lower carbon emissions than those of its carbon-unaware variant and reduces performance variability.
arXiv Detail & Related papers (2025-07-22T03:27:07Z)
Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights [2.1249213103048414]
The rapid adoption of large language models (LLMs) has led to significant energy consumption and carbon emissions. This paper explores the integration of energy-efficient optimization techniques in the deployment of LLMs to address these concerns.
arXiv Detail & Related papers (2025-04-07T21:56:59Z)
AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services [14.664814078159282]
Large language models (LLMs) have become a growing concern due to their substantial energy consumption and carbon footprint. We propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain. We propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL)
arXiv Detail & Related papers (2025-03-06T13:21:38Z)
CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs [0.0]
This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs)<n>To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index)<n>Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions.
arXiv Detail & Related papers (2024-12-03T17:32:47Z)
Carbon Footprint Accounting Driven by Large Language Models and Retrieval-augmented Generation [3.428260237038657]
Traditional life cycle assessment methods rely heavily on human expertise, making near-real-time updates challenging. This paper introduces a novel approach integrating large language models (LLMs) with retrieval-augmented generation technology to enhance the real-time, professional, and economical aspects of carbon footprint information retrieval and analysis.
arXiv Detail & Related papers (2024-08-19T06:05:24Z)
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency [7.073435885680335]
We propose DynamoLLM, the first energy-management framework for generative large language models. At a service-level, DynamoLLM conserves 53% energy and 38% operational carbon emissions, and reduces 61% cost to the customer.
arXiv Detail & Related papers (2024-08-01T17:40:45Z)
Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT) In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z)
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models [7.132822974156601]
The carbon footprint of large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes. We introduce textitcarb, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs.
arXiv Detail & Related papers (2023-09-25T14:50:04Z)
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning [77.62876532784759]
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. We present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision.
arXiv Detail & Related papers (2023-02-16T18:35:00Z)
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model [72.65502770895417]
We quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes ofcarboneqif we consider only the dynamic power consumption. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of machine learning models.
arXiv Detail & Related papers (2022-11-03T17:13:48Z)
Low Emission Building Control with Zero-Shot Reinforcement Learning [70.70479436076238]
Control via Reinforcement Learning (RL) has been shown to significantly improve building energy efficiency. We show it is possible to obtain emission-reducing policies without a priori--a paradigm we call zero-shot building control.
arXiv Detail & Related papers (2022-08-12T17:13:25Z)
Mitigating Out-of-Distribution Data Density Overestimation in Energy-Based Models [54.06799491319278]
Deep energy-based models (EBMs) are receiving increasing attention due to their ability to learn complex distributions. To train deep EBMs, the maximum likelihood estimation (MLE) with short-run Langevin Monte Carlo (LMC) is often used. We investigate why the MLE with short-run LMC can converge to EBMs with wrong density estimates.
arXiv Detail & Related papers (2022-05-30T02:49:17Z)
Learning Latent Space Energy-Based Prior Model [118.86447805707094]
We learn energy-based model (EBM) in the latent space of a generator model. We show that the learned model exhibits strong performances in terms of image and text generation and anomaly detection.
arXiv Detail & Related papers (2020-06-15T08:11:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.