The Ecological Footprint of Neural Machine Translation Systems
- URL: http://arxiv.org/abs/2202.02170v1
- Date: Fri, 4 Feb 2022 14:56:41 GMT
- Title: The Ecological Footprint of Neural Machine Translation Systems
- Authors: Dimitar Sherionov and Eva Vanmassenhove
- Abstract summary: This chapter focuses on the ecological footprint of neural MT systems.
It starts from the power drain during the training of and the inference with neural MT models and moves towards the environment impact.
The overall CO2 offload is calculated for Ireland and the Netherlands.
- Score: 2.132096006921048
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Over the past decade, deep learning (DL) has led to significant advancements
in various fields of artificial intelligence, including machine translation
(MT). These advancements would not be possible without the ever-growing volumes
of data and the hardware that allows large DL models to be trained efficiently.
Due to the large amount of computing cores as well as dedicated memory,
graphics processing units (GPUs) are a more effective hardware solution for
training and inference with DL models than central processing units (CPUs).
However, the former is very power demanding. The electrical power consumption
has economical as well as ecological implications.
This chapter focuses on the ecological footprint of neural MT systems. It
starts from the power drain during the training of and the inference with
neural MT models and moves towards the environment impact, in terms of carbon
dioxide emissions. Different architectures (RNN and Transformer) and different
GPUs (consumer-grate NVidia 1080Ti and workstation-grade NVidia P100) are
compared. Then, the overall CO2 offload is calculated for Ireland and the
Netherlands. The NMT models and their ecological impact are compared to common
household appliances to draw a more clear picture.
The last part of this chapter analyses quantization, a technique for reducing
the size and complexity of models, as a way to reduce power consumption. As
quantized models can run on CPUs, they present a power-efficient inference
solution without depending on a GPU.
Related papers
- MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs [55.95879347182669]
MoE architecture is renowned for its ability to increase model capacity without a proportional increase in inference cost.
MoE-Lightning introduces a novel CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high resource utilization.
MoE-Lightning can achieve up to 10.3x higher throughput than state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a single T4 GPU (16GB)
arXiv Detail & Related papers (2024-11-18T01:06:12Z) - BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery [66.97700597098215]
We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models.
On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days.
The BioNeMo Framework is open-source and free for everyone to use.
arXiv Detail & Related papers (2024-11-15T19:46:16Z) - ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation [4.77407121905745]
Back-propagation (BP) is a major source of computational expense during training deep learning models.
We propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture.
arXiv Detail & Related papers (2024-08-22T17:22:59Z) - Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale [20.30679358575365]
Recent large language models require considerable resources to train and deploy.
With the right amount of power-capping, we show significant decreases in both temperature and power draw.
Our work is the first to conduct and make available a detailed analysis of the effects of GPU power-capping at the supercomputing scale.
arXiv Detail & Related papers (2024-02-25T02:22:34Z) - Towards Physical Plausibility in Neuroevolution Systems [0.276240219662896]
The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference.
This work addresses the growing energy consumption problem in Machine Learning (ML)
Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment.
arXiv Detail & Related papers (2024-01-31T10:54:34Z) - Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology.
This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit.
We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models.
We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Cramming: Training a Language Model on a Single GPU in One Day [64.18297923419627]
Recent trends in language modeling have focused on increasing performance through scaling.
We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU.
We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings.
arXiv Detail & Related papers (2022-12-28T18:59:28Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
Parameter Pretraining [55.16088793437898]
Training extreme-scale models requires enormous amounts of computes and memory footprint.
We propose a simple training strategy called "Pseudo-to-Real" for high-memory-footprint-required large models.
arXiv Detail & Related papers (2021-10-08T04:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.