Related papers: Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures

Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures

URL: http://arxiv.org/abs/2508.13163v1
Date: Mon, 28 Jul 2025 03:25:44 GMT
Title: Sustainable AI Training via Hardware-Software Co-Design on NVIDIA, AMD, and Emerging GPU Architectures
Authors: Yashasvi Makin, Rahul Maliakkal,
Abstract summary: Large-scale deep learning and artificial intelligence model training uses a lot of computational power and energy, so it poses serious sustainability issues.<n>This work explores environmentally driven performance optimization methods for advanced GPU architectures from NVIDIA, AMD, and other emerging GPU architectures.<n>Our main focus is on investigating hardware-software compiler co-design techniques meant to significantly increase memory-level and kernel-level operations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In particular, large-scale deep learning and artificial intelligence model training uses a lot of computational power and energy, so it poses serious sustainability issues. The fast rise in model complexity has resulted in exponential increases in energy consumption, increasing the demand for techniques maximizing computational efficiency and lowering environmental impact. This work explores environmentally driven performance optimization methods especially intended for advanced GPU architectures from NVIDIA, AMD, and other emerging GPU architectures. Our main focus is on investigating hardware-software co-design techniques meant to significantly increase memory-level and kernel-level operations, so improving performance-per-watt measures. Our thorough research encompasses evaluations of specialized tensor and matrix cores, advanced memory optimization methods, and creative integration approaches that taken together result in notable energy efficiency increases. We also discuss important software-level optimizations that augment hardware capability including mixed-precision arithmetic, advanced energy-aware scheduling algorithms, and compiler-driven kernel enhancements. Moreover, we methodically point out important research gaps and suggest future directions necessary to create really sustainable artificial intelligence systems. This paper emphasizes how major increases in training efficiency can be obtained by co-design of hardware and software, so lowering the environmental impact of artificial intelligence without compromising performance. To back up our analysis, we use real-world case studies from top companies like Meta, Google, Amazon, and others that show how these sustainable AI training methods are used in the real world.

Related papers

The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads [0.0]
The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture.<n>As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits.<n>This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads.
arXiv Detail & Related papers (2025-11-13T06:26:39Z)
Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [58.50944604905037]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Energy Considerations of Large Language Model Inference and Efficiency Optimizations [28.55549828393871]
As large language models (LLMs) scale in size and adoption, their computational and environmental costs continue to rise.<n>We systematically analyze the energy implications of common inference efficiency optimizations across diverse NLP and AI workloads.<n>Our findings reveal that the proper application of relevant inference efficiency optimizations can reduce total energy use by up to 73% from unoptimized baselines.
arXiv Detail & Related papers (2025-04-24T15:45:05Z)
A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Investigation of Energy-efficient AI Model Architectures and Compression Techniques for "Green" Fetal Brain Segmentation [42.52549987351643]
Fetal brain segmentation in medical imaging is challenging due to the small size of the fetal brain and the limited image quality of fast 2D sequences. Deep neural networks are a promising method to overcome this challenge. Our study aims to explore model architectures and compression techniques that promote energy efficiency.
arXiv Detail & Related papers (2024-04-03T15:11:53Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs [14.397623940689487]
Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms are reviewed. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators.
arXiv Detail & Related papers (2023-11-08T01:06:25Z)
Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks. Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z)
Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications [3.2634122554914]
We examine the computational energy requirements of different systems driven by the geometrical scaling law. We show that energy efficiency due to geometrical scaling is slowing down. At the application level, general-purpose AI-ML methods can be computationally energy intensive.
arXiv Detail & Related papers (2022-10-12T16:14:33Z)
AI-based Modeling and Data-driven Evaluation for Smart Manufacturing Processes [56.65379135797867]
We propose a dynamic algorithm for gaining useful insights about semiconductor manufacturing processes. We elaborate on the utilization of a Genetic Algorithm and Neural Network to propose an intelligent feature selection algorithm.
arXiv Detail & Related papers (2020-08-29T14:57:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.