Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository
Mining Study
- URL: http://arxiv.org/abs/2305.11164v3
- Date: Wed, 29 Nov 2023 23:07:15 GMT
- Title: Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository
Mining Study
- Authors: Joel Casta\~no, Silverio Mart\'inez-Fern\'andez, Xavier Franch, Justus
Bogner
- Abstract summary: The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions.
This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models?
- Score: 8.409033836300761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of machine learning (ML) systems has exacerbated their carbon
footprint due to increased capabilities and model sizes. However, there is
scarce knowledge on how the carbon footprint of ML models is actually measured,
reported, and evaluated. In light of this, the paper aims to analyze the
measurement of the carbon footprint of 1,417 ML models and associated datasets
on Hugging Face, which is the most popular repository for pretrained ML models.
The goal is to provide insights and recommendations on how to report and
optimize the carbon efficiency of ML models. The study includes the first
repository mining study on the Hugging Face Hub API on carbon emissions. This
study seeks to answer two research questions: (1) how do ML model creators
measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects
impact the carbon emissions of training ML models? The study yielded several
key findings. These include a stalled proportion of carbon emissions-reporting
models, a slight decrease in reported carbon footprint on Hugging Face over the
past 2 years, and a continued dominance of NLP as the main application domain.
Furthermore, the study uncovers correlations between carbon emissions and
various attributes such as model size, dataset size, and ML application
domains. These results highlight the need for software measurements to improve
energy reporting practices and promote carbon-efficient model development
within the Hugging Face community. In response to this issue, two
classifications are proposed: one for categorizing models based on their carbon
emission reporting practices and another for their carbon efficiency. The aim
of these classification proposals is to foster transparency and sustainable
model development within the ML community.
Related papers
- Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years.
This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands.
It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z) - CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling [9.05128569357374]
We present CarbonSense, the first machine learning-ready dataset for data-driven carbon flux modelling.
Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain.
arXiv Detail & Related papers (2024-06-07T13:47:40Z) - Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT)
In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT.
We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z) - Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training [9.182429523979598]
We evaluate the CO2 emissions of well-known large language models, which have an especially high carbon footprint due to their significant amount of model parameters.
We argue for the training of LLMs in a way that is responsible and sustainable by suggesting measures for reducing carbon emissions.
arXiv Detail & Related papers (2024-04-01T15:01:45Z) - LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
Models [7.132822974156601]
The carbon footprint of large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes.
We introduce textitcarb, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs.
arXiv Detail & Related papers (2023-09-25T14:50:04Z) - Machine Guided Discovery of Novel Carbon Capture Solvents [48.7576911714538]
Machine learning offers a promising method for reducing the time and resource burdens of materials development.
We have developed an end-to-end "discovery cycle" to select new aqueous amines compatible with the commercially viable acid gas scrubbing carbon capture.
The prediction process shows 60% accuracy against experiment for both material parameters and 80% for a single parameter on an external test set.
arXiv Detail & Related papers (2023-03-24T18:32:38Z) - Counting Carbon: A Survey of Factors Influencing the Emissions of
Machine Learning [77.62876532784759]
Machine learning (ML) requires using energy to carry out computations during the model training process.
The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source.
We present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision.
arXiv Detail & Related papers (2023-02-16T18:35:00Z) - Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language
Model [72.65502770895417]
We quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle.
We estimate that BLOOM's final training emitted approximately 24.7 tonnes ofcarboneqif we consider only the dynamic power consumption.
We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of machine learning models.
arXiv Detail & Related papers (2022-11-03T17:13:48Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z) - Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine
Translation [0.0]
We study the carbon efficiency and look for alternatives to reduce the overall environmental impact of training models.
In our work, we assess the performance of models for machine translation, across multiple language pairs.
We examine the various components of these models to analyze aspects of our pipeline that can be optimized to reduce these carbon emissions.
arXiv Detail & Related papers (2021-09-26T12:30:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.