LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences
- URL: http://arxiv.org/abs/2410.02950v1
- Date: Thu, 3 Oct 2024 19:48:45 GMT
- Title: LLMCO2: Advancing Accurate Carbon Footprint Prediction for LLM Inferences
- Authors: Zhenxiao Fu, Fan Chen, Shan Zhou, Haitong Li, Lei Jiang,
- Abstract summary: Estimating the carbon footprint of large language models (LLM) inferences is more complex than training.
coo is a graph neural network (GNN)-based model that greatly improves the accuracy of LLM inference carbon footprint predictions.
- Score: 7.137654106298203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Throughout its lifecycle, a large language model (LLM) generates a substantially larger carbon footprint during inference than training. LLM inference requests vary in batch size, prompt length, and token generation number, while cloud providers employ different GPU types and quantities to meet diverse service-level objectives for accuracy and latency. It is crucial for both users and cloud providers to have a tool that quickly and accurately estimates the carbon impact of LLM inferences based on a combination of inference request and hardware configurations before execution. Estimating the carbon footprint of LLM inferences is more complex than training due to lower and highly variable model FLOPS utilization, rendering previous equation-based models inaccurate. Additionally, existing machine learning (ML) prediction methods either lack accuracy or demand extensive training data, as they inadequately handle the distinct prefill and decode phases, overlook hardware-specific features, and inefficiently sample uncommon inference configurations. We introduce \coo, a graph neural network (GNN)-based model that greatly improves the accuracy of LLM inference carbon footprint predictions compared to previous methods.
Related papers
- Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more-efficient metric for performance estimation.
We extend the power law analytical function to predict domain-specific pre-training loss based on FLOPs across data sources.
We employ a two-layer neural network to model the non-linear relationship between multiple domain-specific loss and downstream performance.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Getting the most out of your tokenizer for pre-training and domain
adaptation [26.427537023771844]
We show that the size, pre-tokenization regular expression, and training data of a tokenizer can significantly impact the model's generation speed.
We specialize the tokenizer of a pre-trained LLM to obtain large gains in generation speed and effective context size.
arXiv Detail & Related papers (2024-02-01T21:49:34Z) - LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
Models [7.132822974156601]
The carbon footprint of large language models (LLMs) is a significant concern, encompassing emissions from their training, inference, experimentation, and storage processes.
We introduce textitcarb, an end-to-end carbon footprint projection model designed for both dense and MoE LLMs.
arXiv Detail & Related papers (2023-09-25T14:50:04Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales [65.01417261415833]
We present an approach to predict the pre-training loss based on our observations that Maximal Update Parametrization (muP) enables accurate fitting of scaling laws.
With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B.
Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models.
arXiv Detail & Related papers (2023-04-14T00:45:01Z) - Sequential Learning Of Neural Networks for Prequential MDL [18.475866691786695]
We evaluate approaches for computing prequential description lengths for image classification datasets with neural networks.
Considering the computational cost, we find that online-learning with rehearsal has favorable performance.
We present description lengths for a suite of image classification datasets that improve upon previously reported results by large margins.
arXiv Detail & Related papers (2022-10-14T16:30:23Z) - LEAPER: Modeling Cloud FPGA-based Systems via Transfer Learning [13.565689665335697]
We propose LEAPER, a transfer learning-based approach for FPGA-based systems that adapts an existing ML-based model to a new, unknown environment.
Results show that our approach delivers, on average, 85% accuracy when we use our transferred model for prediction in a cloud environment with 5-shot learning.
arXiv Detail & Related papers (2022-08-22T21:25:56Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.