An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training
- URL: http://arxiv.org/abs/2509.13516v1
- Date: Tue, 16 Sep 2025 20:20:22 GMT
- Title: An Analysis of Optimizer Choice on Energy Efficiency and Performance in Neural Network Training
- Authors: Tom Almog,
- Abstract summary: This paper presents a comprehensive study investigating the relationship between choice and energy efficiency in neural network training.<n>We conducted 360 controlled experiments across three benchmark datasets (MNIST, CIFAR-10, CIFAR-100) using eight popular Adas.<n>Our findings reveal substantial trade-offs between training speed, accuracy, and environmental impact that vary across datasets and model complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As machine learning models grow increasingly complex and computationally demanding, understanding the environmental impact of training decisions becomes critical for sustainable AI development. This paper presents a comprehensive empirical study investigating the relationship between optimizer choice and energy efficiency in neural network training. We conducted 360 controlled experiments across three benchmark datasets (MNIST, CIFAR-10, CIFAR-100) using eight popular optimizers (SGD, Adam, AdamW, RMSprop, Adagrad, Adadelta, Adamax, NAdam) with 15 random seeds each. Using CodeCarbon for precise energy tracking on Apple M1 Pro hardware, we measured training duration, peak memory usage, carbon dioxide emissions, and final model performance. Our findings reveal substantial trade-offs between training speed, accuracy, and environmental impact that vary across datasets and model complexity. We identify AdamW and NAdam as consistently efficient choices, while SGD demonstrates superior performance on complex datasets despite higher emissions. These results provide actionable insights for practitioners seeking to balance performance and sustainability in machine learning workflows.
Related papers
- On Electric Vehicle Energy Demand Forecasting and the Effect of Federated Learning [1.2599533416395765]
Energy Demand Forecasting of Electric Vehicle Supply Equipments (EVSEs) is one of the most critical operations for ensuring efficient energy management and sustainability.<n>It enables utility providers to anticipate energy/power demand, optimize resource allocation, and implement proactive measures to improve grid reliability.<n>As concerns and restrictions about privacy and sustainability have grown, training data has become increasingly fragmented.
arXiv Detail & Related papers (2026-02-24T11:21:45Z) - UserRL: Training Interactive User-Centric Agent via Reinforcement Learning [104.63494870852894]
Reinforcement learning (RL) has shown promise in training agentic models that engage in dynamic, multi-turn interactions.<n>We propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments.
arXiv Detail & Related papers (2025-09-24T03:33:20Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [61.145371212636505]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model.<n>We systematically analyze the performance of different RL and control-based methods under datasets of varying quality.<n>Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - EXAdam: The Power of Adaptive Cross-Moments [0.0]
This paper introduces EXAdam, a novel optimization algorithm that builds upon the widely-used Adam algorithm.<n> EXAdam incorporates two key enhancements: (1) new debiasing terms for improved moment estimation and (2) a gradient-based acceleration mechanism.<n> Empirical evaluations demonstrate EXAdam's superiority over Adam, achieving 38.46% faster convergence and yielding improvements of 1.96%, 2.17%, and 1.17% in training, validation, and testing accuracies.
arXiv Detail & Related papers (2024-12-29T00:11:54Z) - Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z) - ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation [4.77407121905745]
Back-propagation (BP) is a major source of computational expense during training deep learning models.<n>We propose a general, energy-efficient convolution module that can be seamlessly integrated into any deep learning architecture.
arXiv Detail & Related papers (2024-08-22T17:22:59Z) - Estimating Deep Learning energy consumption based on model architecture and training environment [5.465797591588829]
We investigate how model architecture and training environment affect energy consumption.<n>We find that selecting the right model-training environment combination can reduce training energy consumption by up to 80.68%.<n>We propose the Stable Training Epoch Projection (STEP) and the Pre-training Regression-based Estimation (PRE) methods.
arXiv Detail & Related papers (2023-07-07T12:07:59Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Uncovering Energy-Efficient Practices in Deep Learning Training:
Preliminary Steps Towards Green AI [8.025202812165412]
We consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage.
We examine the training stage of the deep learning pipeline from a sustainability perspective.
We highlight innovative and promising energy-efficient practices for training deep learning models.
arXiv Detail & Related papers (2023-03-24T12:48:21Z) - How Do Adam and Training Strategies Help BNNs Optimization? [50.22482900678071]
We show that Adam is better equipped to handle the rugged loss surface of BNNs and reaches a better optimum with higher generalization ability.
We derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-06-21T17:59:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.