Carbon Aware Transformers Through Joint Model-Hardware Optimization
- URL: http://arxiv.org/abs/2505.01386v2
- Date: Thu, 08 May 2025 22:21:11 GMT
- Title: Carbon Aware Transformers Through Joint Model-Hardware Optimization
- Authors: Irene Wang, Newsha Ardalani, Mostafa Elhoushi, Daniel Jiang, Samuel Hsia, Ekin Sumbul, Divya Mahajan, Carole-Jean Wu, Bilge Acun,
- Abstract summary: There is a lack of tools to holistically quantify and optimize the total carbon footprint of machine learning systems.<n>We propose CATransformers, a carbon-aware architecture search framework that enables sustainability-driven co-optimization of ML models and hardware architectures.<n>We apply our framework to multi-modal CLIP-based models, producing CarbonCLIP, a family of CLIP models achieving up to 17% reduction in total carbon emissions.
- Score: 6.731982787210602
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid growth of machine learning (ML) systems necessitates a more comprehensive evaluation of their environmental impact, particularly their carbon footprint, which comprises operational carbon from training and inference execution and embodied carbon from hardware manufacturing and its entire life-cycle. Despite the increasing importance of embodied emissions, there is a lack of tools and frameworks to holistically quantify and optimize the total carbon footprint of ML systems. To address this, we propose CATransformers, a carbon-aware architecture search framework that enables sustainability-driven co-optimization of ML models and hardware architectures. By incorporating both operational and embodied carbon metrics into early design space exploration of domain-specific hardware accelerators, CATransformers demonstrates that optimizing for carbon yields design choices distinct from those optimized solely for latency or energy efficiency. We apply our framework to multi-modal CLIP-based models, producing CarbonCLIP, a family of CLIP models achieving up to 17% reduction in total carbon emissions while maintaining accuracy and latency compared to state-of-the-art edge small CLIP baselines. This work underscores the need for holistic optimization methods to design high-performance, environmentally sustainable AI systems.
Related papers
- CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference [34.693462786320545]
CoFormer is a collaborative inference system for general transformer models.<n>CoFormer enables the efficient inference of GPT2-XL with 1.6 billion parameters on edge devices, reducing memory requirements by 76.3%.
arXiv Detail & Related papers (2025-08-28T02:50:12Z) - Diffusion-Modeled Reinforcement Learning for Carbon and Risk-Aware Microgrid Optimization [48.70916202664808]
DiffCarl is a diffusion-modeled carbon- and risk-aware reinforcement learning algorithm for intelligent operation of multi-microgrid systems.<n>It outperforms classic algorithms and state-of-the-art DRL solutions, with 2.3-30.1% lower operational cost.<n>It also achieves 28.7% lower carbon emissions than those of its carbon-unaware variant and reduces performance variability.
arXiv Detail & Related papers (2025-07-22T03:27:07Z) - Carbon-Efficient 3D DNN Acceleration: Optimizing Performance and Sustainability [7.059399419316343]
3D integration improves performance but introduces sustainability challenges.<n>We propose a carbon-efficient design methodology for 3D accelerators.<n>Our approach effectively reduces silicon area and fabrication overhead while maintaining high computational accuracy.
arXiv Detail & Related papers (2025-04-14T03:48:37Z) - AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services [14.664814078159282]
Large language models (LLMs) have become a growing concern due to their substantial energy consumption and carbon footprint.<n>We propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services.<n>AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain.<n>We propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL)
arXiv Detail & Related papers (2025-03-06T13:21:38Z) - Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z) - STAR: Synthesis of Tailored Architectures [61.080157488857516]
We propose a new approach for the synthesis of tailored architectures (STAR)<n>Our approach combines a novel search space based on the theory of linear input-varying systems, supporting a hierarchical numerical encoding into architecture genomes. STAR genomes are automatically refined and recombined with gradient-free, evolutionary algorithms to optimize for multiple model quality and efficiency metrics.<n>Using STAR, we optimize large populations of new architectures, leveraging diverse computational units and interconnection patterns, improving over highly-optimized Transformers and striped hybrid models on the frontier of quality, parameter size, and inference cache for autoregressive language modeling.
arXiv Detail & Related papers (2024-11-26T18:42:42Z) - Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment [3.391499691517567]
Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices.
We propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization.
Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators.
arXiv Detail & Related papers (2024-07-16T12:36:10Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Carbon Market Simulation with Adaptive Mechanism Design [55.25103894620696]
A carbon market is a market-based tool that incentivizes economic agents to align individual profits with the global utility.
We propose an adaptive mechanism design framework, simulating the market using hierarchical, model-free multi-agent reinforcement learning (MARL)
Numerical results show MARL enables government agents to balance productivity, equality, and carbon emissions.
arXiv Detail & Related papers (2024-06-12T05:08:51Z) - Beyond Efficiency: Scaling AI Sustainably [4.711003829305544]
Modern AI applications have driven ever-increasing demands in computing.
This paper characterizes the carbon impact of AI, including both operational carbon emissions from training and inference as well as embodied carbon emissions from hardware manufacturing.
arXiv Detail & Related papers (2024-06-08T00:07:16Z) - Carbon-aware Software Services [3.105112058253643]
This article proposes a novel framework for implementing, configuring and assessing carbon-aware interactive software services.
We propose a methodology to implement carbon-aware services leveraging the Strategy design pattern to feature alternative service versions with different energy consumption.
We devise a bilevel optimisation scheme to configure which version to use at different times of the day, based on forecasts of carbon intensity and service requests.
arXiv Detail & Related papers (2024-05-21T08:26:38Z) - Generative AI for Low-Carbon Artificial Intelligence of Things with Large Language Models [67.0243099823109]
Generative AI (GAI) holds immense potential to reduce carbon emissions of Artificial Intelligence of Things (AIoT)
In this article, we explore the potential of GAI for carbon emissions reduction and propose a novel GAI-enabled solution for low-carbon AIoT.
We propose a Large Language Model (LLM)-enabled carbon emission optimization framework, in which we design pluggable LLM and Retrieval Augmented Generation (RAG) modules.
arXiv Detail & Related papers (2024-04-28T05:46:28Z) - Benchmarking and In-depth Performance Study of Large Language Models on
Habana Gaudi Processors [5.432613942292548]
Transformer models have achieved remarkable success in various machine learning tasks but suffer from high computational complexity and resource requirements.
Specialized AI hardware accelerators, such as the Habana GAUDI architecture, offer a promising solution to tackle these issues.
This paper explores the untapped potential of using GAUDI processors to accelerate Transformer-based models, addressing key challenges in the process.
arXiv Detail & Related papers (2023-09-29T04:49:35Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - TransCODE: Co-design of Transformers and Accelerators for Efficient
Training and Inference [6.0093441900032465]
We propose a framework that simulates transformer inference and training on a design space of accelerators.
We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models.
The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair.
arXiv Detail & Related papers (2023-03-27T02:45:18Z) - Real-time high-resolution CO$_2$ geological storage prediction using
nested Fourier neural operators [58.728312684306545]
Carbon capture and storage (CCS) plays an essential role in global decarbonization.
Scaling up CCS deployment requires accurate and high-resolution modeling of the storage reservoir pressure buildup and the gaseous plume migration.
We introduce Nested Fourier Neural Operator (FNO), a machine-learning framework for high-resolution dynamic 3D CO2 storage modeling at a basin scale.
arXiv Detail & Related papers (2022-10-31T04:04:03Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z) - Local-to-Global Self-Attention in Vision Transformers [130.0369761612812]
Transformers have demonstrated great potential in computer vision tasks.
Some recent Transformer models adopt a hierarchical design, where self-attentions are only computed within local windows.
This design significantly improves the efficiency but lacks global feature reasoning in early stages.
In this work, we design a multi-path structure of the Transformer, which enables local-to-global reasoning at multiple granularities in each stage.
arXiv Detail & Related papers (2021-07-10T02:34:55Z) - Machine Learning Framework for Quantum Sampling of Highly-Constrained,
Continuous Optimization Problems [101.18253437732933]
We develop a generic, machine learning-based framework for mapping continuous-space inverse design problems into surrogate unconstrained binary optimization problems.
We showcase the framework's performance on two inverse design problems by optimizing thermal emitter topologies for thermophotovoltaic applications and (ii) diffractive meta-gratings for highly efficient beam steering.
arXiv Detail & Related papers (2021-05-06T02:22:23Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.