Related papers: Impact of ML Optimization Tactics on Greener Pre-Trained ML Models

Impact of ML Optimization Tactics on Greener Pre-Trained ML Models

URL: http://arxiv.org/abs/2409.12878v1
Date: Thu, 19 Sep 2024 16:23:03 GMT
Title: Impact of ML Optimization Tactics on Greener Pre-Trained ML Models
Authors: Alexandra González Álvarez, Joel Castaño, Xavier Franch, Silverio Martínez-Fernández,
Abstract summary: This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations. We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification. Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems.
Score: 46.78148962732881
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: Given the fast-paced nature of today's technology, which has surpassed human performance in tasks like image classification, visual reasoning, and English understanding, assessing the impact of Machine Learning (ML) on energy consumption is crucial. Traditionally, ML projects have prioritized accuracy over energy, creating a gap in energy consumption during model inference. Aims: This study aims to (i) analyze image classification datasets and pre-trained models, (ii) improve inference efficiency by comparing optimized and non-optimized models, and (iii) assess the economic impact of the optimizations. Method: We conduct a controlled experiment to evaluate the impact of various PyTorch optimization techniques (dynamic quantization, torch.compile, local pruning, and global pruning) to 42 Hugging Face models for image classification. The metrics examined include GPU utilization, power and energy consumption, accuracy, time, computational complexity, and economic costs. The models are repeatedly evaluated to quantify the effects of these software engineering tactics. Results: Dynamic quantization demonstrates significant reductions in inference time and energy consumption, making it highly suitable for large-scale systems. Additionally, torch.compile balances accuracy and energy. In contrast, local pruning shows no positive impact on performance, and global pruning's longer optimization times significantly impact costs. Conclusions: This study highlights the role of software engineering tactics in achieving greener ML models, offering guidelines for practitioners to make informed decisions on optimization methods that align with sustainability goals.

Related papers

Energy Considerations of Large Language Model Inference and Efficiency Optimizations [28.55549828393871]
As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise. We systematically analyze the energy implications of common inference efficiency optimizations across diverse NLP and AI workloads. Our findings reveal that the proper application of relevant inference efficiency optimizations can reduce total energy use by up to 73% from unoptimized baselines.
arXiv Detail & Related papers (2025-04-24T15:45:05Z)
Investigating Energy Efficiency and Performance Trade-offs in LLM Inference Across Tasks and DVFS Settings [1.5749416770494706]
Large language models (LLMs) have shown significant improvements in many natural language processing (NLP) tasks. LLMs are resource-intensive, requiring extensive computational resources both during training and inference. As their adoption accelerates, the sustainability of LLMs has become a critical issue.
arXiv Detail & Related papers (2025-01-14T16:02:33Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation. deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency. This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z)
Automated Computational Energy Minimization of ML Algorithms using Constrained Bayesian Optimization [1.2891210250935148]
We evaluate Constrained Bayesian Optimization (CBO) with the primary objective of minimizing energy consumption. We demonstrate that CBO lower energy consumption without compromising the predictive performance of ML models.
arXiv Detail & Related papers (2024-07-08T09:49:38Z)
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically. We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z)
Design Amortization for Bayesian Optimal Experimental Design [70.13948372218849]
We build off of successful variational approaches, which optimize a parameterized variational model with respect to bounds on the expected information gain (EIG) We present a novel neural architecture that allows experimenters to optimize a single variational model that can estimate the EIG for potentially infinitely many designs.
arXiv Detail & Related papers (2022-10-07T02:12:34Z)
Deep Learning-based Prediction of Key Performance Indicators for Electrical Machine [0.0]
A data-aided, deep learning-based meta-model is employed to predict the design of an electrical machine quickly and with high accuracy. The results show a high prediction accuracy and proof that the validity of a deep learning-based meta-model to minimize the optimization time.
arXiv Detail & Related papers (2020-12-16T18:03:58Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)
End-to-end deep metamodeling to calibrate and optimize energy loads [0.0]
We propose a new end-to-end methodology to optimize the energy performance and the comfort, air quality and hygiene of large buildings. A metamodel based on a Transformer network is introduced and trained using a dataset sampled with a simulation program.
arXiv Detail & Related papers (2020-06-19T07:40:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.