Related papers: Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study

Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study

URL: http://arxiv.org/abs/2305.11164v3
Date: Wed, 29 Nov 2023 23:07:15 GMT
Title: Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study
Authors: Joel Casta\~no, Silverio Mart\'inez-Fern\'andez, Xavier Franch, Justus Bogner
Abstract summary: The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions. This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models?
Score: 8.409033836300761
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of machine learning (ML) systems has exacerbated their carbon footprint due to increased capabilities and model sizes. However, there is scarce knowledge on how the carbon footprint of ML models is actually measured, reported, and evaluated. In light of this, the paper aims to analyze the measurement of the carbon footprint of 1,417 ML models and associated datasets on Hugging Face, which is the most popular repository for pretrained ML models. The goal is to provide insights and recommendations on how to report and optimize the carbon efficiency of ML models. The study includes the first repository mining study on the Hugging Face Hub API on carbon emissions. This study seeks to answer two research questions: (1) how do ML model creators measure and report carbon emissions on Hugging Face Hub?, and (2) what aspects impact the carbon emissions of training ML models? The study yielded several key findings. These include a stalled proportion of carbon emissions-reporting models, a slight decrease in reported carbon footprint on Hugging Face over the past 2 years, and a continued dominance of NLP as the main application domain. Furthermore, the study uncovers correlations between carbon emissions and various attributes such as model size, dataset size, and ML application domains. These results highlight the need for software measurements to improve energy reporting practices and promote carbon-efficient model development within the Hugging Face community. In response to this issue, two classifications are proposed: one for categorizing models based on their carbon emission reporting practices and another for their carbon efficiency. The aim of these classification proposals is to foster transparency and sustainable model development within the ML community.

Related papers

CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs [0.0]
This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs) To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index) Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions.
arXiv Detail & Related papers (2024-12-03T17:32:47Z)
Machine Learning for Methane Detection and Quantification from Space -- A survey [49.7996292123687]
Methane (CH_4) is a potent anthropogenic greenhouse gas, contributing 86 times more to global warming than Carbon Dioxide (CO_2) over 20 years. This work expands existing information on operational methane point source detection sensors in the Short-Wave Infrared (SWIR) bands. It reviews the state-of-the-art for traditional as well as Machine Learning (ML) approaches.
arXiv Detail & Related papers (2024-08-27T15:03:20Z)
CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling [9.05128569357374]
We present CarbonSense, the first machine learning-ready dataset for data-driven carbon flux modelling. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain.
arXiv Detail & Related papers (2024-06-07T13:47:40Z)
Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT) In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT. We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z)
Green AI: Exploring Carbon Footprints, Mitigation Strategies, and Trade Offs in Large Language Model Training [9.182429523979598]
We evaluate the CO2 emissions of well-known large language models, which have an especially high carbon footprint due to their significant amount of model parameters. We argue for the training of LLMs in a way that is responsible and sustainable by suggesting measures for reducing carbon emissions.
arXiv Detail & Related papers (2024-04-01T15:01:45Z)
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models [7.132822974156601]
The carbon footprint of large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes. We introduce textitcarb, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs.
arXiv Detail & Related papers (2023-09-25T14:50:04Z)
Machine Guided Discovery of Novel Carbon Capture Solvents [48.7576911714538]
Machine learning offers a promising method for reducing the time and resource burdens of materials development. We have developed an end-to-end "discovery cycle" to select new aqueous amines compatible with the commercially viable acid gas scrubbing carbon capture. The prediction process shows 60% accuracy against experiment for both material parameters and 80% for a single parameter on an external test set.
arXiv Detail & Related papers (2023-03-24T18:32:38Z)
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning [77.62876532784759]
Machine learning (ML) requires using energy to carry out computations during the model training process. The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source. We present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision.
arXiv Detail & Related papers (2023-02-16T18:35:00Z)
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model [72.65502770895417]
We quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes ofcarboneqif we consider only the dynamic power consumption. We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of machine learning models.
arXiv Detail & Related papers (2022-11-03T17:13:48Z)
Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions. We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z)
Curb Your Carbon Emissions: Benchmarking Carbon Emissions in Machine Translation [0.0]
We study the carbon efficiency and look for alternatives to reduce the overall environmental impact of training models. In our work, we assess the performance of models for machine translation, across multiple language pairs. We examine the various components of these models to analyze aspects of our pipeline that can be optimized to reduce these carbon emissions.
arXiv Detail & Related papers (2021-09-26T12:30:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.