Related papers: Holistically Evaluating the Environmental Impact of Creating Language Models

Holistically Evaluating the Environmental Impact of Creating Language Models

URL: http://arxiv.org/abs/2503.05804v1
Date: Mon, 03 Mar 2025 22:16:15 GMT
Title: Holistically Evaluating the Environmental Impact of Creating Language Models
Authors: Jacob Morrison, Clara Na, Jared Fernandez, Tim Dettmers, Emma Strubell, Jesse Dodge,
Abstract summary: We estimate the real-world environmental impact of developing a series of language models, ranging from 20 million to 13 billion active parameters, trained on up to 5.6 trillion tokens each.<n>We find that our series of models released 493 metric tons of carbon emissions, equivalent to powering about 98 homes in the United States for one year.<n>We also find that power usage throughout training is not consistent, fluctuating between 15% and 85% of our hardware's maximum power draw.
Score: 26.846990296567267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the performance of artificial intelligence systems has dramatically increased, so too has the environmental impact of creating these systems. While many model developers release estimates of the power consumption and carbon emissions from the final training runs for their latest models, there is comparatively little transparency into the impact of model development, hardware manufacturing, and total water usage throughout. In this work, we estimate the real-world environmental impact of developing a series of language models, ranging from 20 million to 13 billion active parameters, trained on up to 5.6 trillion tokens each. When accounting for hardware manufacturing, model development, and our final training runs, we find that our series of models released 493 metric tons of carbon emissions, equivalent to powering about 98 homes in the United States for one year, and consumed 2.769 million liters of water, equivalent to about 24.5 years of water usage by a person in the United States, even though our data center is extremely water-efficient. We measure and report the environmental impact of our model development; to the best of our knowledge we are the first to do so for LLMs, and we find that model development, the impact of which is generally not disclosed by most model developers, amounted to ~50% of that of training. By looking at detailed time series data for power consumption, we also find that power usage throughout training is not consistent, fluctuating between ~15% and ~85% of our hardware's maximum power draw, with negative implications for grid-scale planning as demand continues to grow. We close with a discussion on the continued difficulty of estimating the environmental impact of AI systems, and key takeaways for model developers and the public at large.

Related papers

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference [0.0]
This paper introduces a novel infrastructure-aware benchmarking framework for quantifying the environmental footprint of AI inference across 30 state-of-the-art models as deployed in commercial data centers.<n>Our results show that o3 and DeepSeek-R1 emerge as the most energy-intensive models, consuming over 33 Wh per long prompt, more than 70 times the consumption of GPT-4.1 nano, and that Claude-3.7 Sonnet ranks highest in eco-efficiency.<n>These findings illustrate a growing paradox: Although AI is becoming cheaper and faster, its global adoption drives disproportionate resource consumption.
arXiv Detail & Related papers (2025-05-14T17:47:00Z)
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs [96.68469559192846]
We present two differently sized MoE large language models (LLMs)<n>Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters.<n>We propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency.
arXiv Detail & Related papers (2025-03-07T04:43:39Z)
Systematic Weight Evaluation for Pruning Large Language Models: Enhancing Performance and Sustainability [1.542607498220242]
This research focuses on the systematic evaluation of individual weight importance throughout the training process. We propose a method that effectively reduces model size without compromising performance. These findings highlight the critical need for optimized AI models to ensure sustainable development.
arXiv Detail & Related papers (2025-02-24T11:34:49Z)
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts [0.0]
We propose a methodology to estimate the environmental impact of a company's AI portfolio.<n>Results confirm that large generative AI models consume up to 4600x more energy than traditional models.<n>Mitigating the environmental impact of Generative AI by 2030 requires coordinated efforts across the AI value chain.
arXiv Detail & Related papers (2025-01-24T08:58:49Z)
Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge [7.419725234099729]
ChatGPT sparked social interest in Large Language Models (LLMs) LLMs demand substantial computational resources and are very costly to train, both financially and environmentally. In this study, we infused T5 LLM with external knowledge and fine-tuned the model for Question-Answering task.
arXiv Detail & Related papers (2024-07-24T16:16:16Z)
Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology. This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit. We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models. We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z)
Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation [82.85015548989223]
Pentathlon is a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption.
arXiv Detail & Related papers (2023-07-19T01:05:33Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning [77.62876532784759]
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. We present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision.
arXiv Detail & Related papers (2023-02-16T18:35:00Z)
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model [72.65502770895417]
We quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes ofcarboneqif we consider only the dynamic power consumption. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of machine learning models.
arXiv Detail & Related papers (2022-11-03T17:13:48Z)
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models [10.086015702323971]
We follow the steps of the R&D group of a modern legal-tech start-up and present important insights on model development and deployment. We start from ground zero by pre-training multiple domain-specific multi-lingual LMs which are a better fit to contractual and regulatory text. We present benchmark results of such models in a half-public half-private legal benchmark comprising 5 downstream tasks showing the impact of larger model size.
arXiv Detail & Related papers (2022-10-24T10:08:59Z)
Towards Sustainable Deep Learning for Wireless Fingerprinting Localization [0.541530201129053]
Location based services are becoming part of new wireless infrastructures and emerging business processes. Deep Learning (DL) artificial intelligence methods perform very well in wireless fingerprinting localization based on extensive indoor radio measurement data. With the increasing complexity these methods become computationally very intensive and energy hungry. We present a new DL-based architecture for indoor localization that is more energy efficient compared to related state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-22T15:13:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.