Metrics and evaluations for computational and sustainable AI efficiency
- URL: http://arxiv.org/abs/2510.17885v1
- Date: Sat, 18 Oct 2025 03:30:15 GMT
- Title: Metrics and evaluations for computational and sustainable AI efficiency
- Authors: Hongyuan Liu, Xinyang Liu, Guosheng Hu,
- Abstract summary: Current approaches fail to provide a holistic view, making it difficult to compare and optimise systems.<n>We propose a unified and reproducible methodology for AI model inference that integrates computational and environmental metrics.<n>Our framework provides pragmatic, carbon-aware evaluation by systematically measuring latency and distributions throughput, energy consumption, and location-adjusted carbon emissions.
- Score: 26.52588349722099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Artificial Intelligence (AI) has created unprecedented demands for computational power, yet methods for evaluating the performance, efficiency, and environmental impact of deployed models remain fragmented. Current approaches often fail to provide a holistic view, making it difficult to compare and optimise systems across heterogeneous hardware, software stacks, and numeric precisions. To address this gap, we propose a unified and reproducible methodology for AI model inference that integrates computational and environmental metrics under realistic serving conditions. Our framework provides a pragmatic, carbon-aware evaluation by systematically measuring latency and throughput distributions, energy consumption, and location-adjusted carbon emissions, all while maintaining matched accuracy constraints for valid comparisons. We apply this methodology to multi-precision models across diverse hardware platforms, from data-centre accelerators like the GH200 to consumer-level GPUs such as the RTX 4090, running on mainstream software stacks including PyTorch, TensorRT, and ONNX Runtime. By systematically categorising these factors, our work establishes a rigorous benchmarking framework that produces decision-ready Pareto frontiers, clarifying the trade-offs between accuracy, latency, energy, and carbon. The accompanying open-source code enables independent verification and facilitates adoption, empowering researchers and practitioners to make evidence-based decisions for sustainable AI deployment.
Related papers
- Pareto Optimal Benchmarking of AI Models on ARM Cortex Processors for Sustainable Embedded Systems [2.584048323685663]
The research highlights a nearlinear correlation between floating-point operations (FLOPs) and inference time, offering a reliable metric for estimating computational demands.<n>We show how to balance trade-offs between energy consumption and model accuracy, ensuring that AI applications meet performance requirements without compromising sustainability.<n>This work provides insights for developers, guiding them to design energy-efficient AI systems that deliver high performance in realworld applications.
arXiv Detail & Related papers (2026-02-19T16:21:47Z) - AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models [2.7946918847372277]
We propose AI-CARE, an evaluation tool for reporting energy consumption, and carbon emissions of machine learning models.<n>We demonstrate, through theoretical analysis and empirical validation, that carbon-aware benchmarking changes the relative ranking of models.<n>Our proposal aims to shift the research community toward transparent, multi-objective evaluation and align ML progress with global sustainability goals.
arXiv Detail & Related papers (2026-02-17T21:52:48Z) - TokaMark: A Comprehensive Benchmark for MAST Tokamak Plasma Models [56.94569090844015]
TokaMark is a structured benchmark to evaluate AI models on real experimental data collected from the Mega Ampere Spherical Tokamak (MAST)<n>TokaMark aims to accelerate progress in data-driven AI-based plasma modeling, contributing to the broader goal of achieving sustainable and stable fusion energy.
arXiv Detail & Related papers (2026-02-05T16:49:44Z) - Smart but Costly? Benchmarking LLMs on Functional Accuracy and Energy Efficiency [5.771786260272727]
We present a framework, BRACE, to benchmark Code Language Models on a unified scale of energy efficiency and functional correctness.<n>We propose two rating methods: Concentric Incremental Rating Circles (CIRC) and Observation to Expectation Rating (OTER)<n>Our analysis reveals models generally perform better in the code summarization tasks as they are not enforced to generate a grammar-based and syntactically correct output.
arXiv Detail & Related papers (2025-11-10T23:44:48Z) - ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge [17.74343318260183]
ECORE is a framework that integrates multiple dynamic routing strategies.<n>ECORE balances energy efficiency and detection performance based on object characteristics.<n>Results demonstrate that our proposed context-aware routing strategies can reduce energy consumption and latency by 35% and 49%, respectively.
arXiv Detail & Related papers (2025-07-08T14:16:14Z) - WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads [8.545822371190125]
WattsOnAI is a comprehensive software toolkit for the measurement, analysis, and visualization of energy use, power draw, hardware performance, and carbon emissions across AI workloads.<n>By seamlessly integrating with existing AI frameworks, WattsOnAI offers standardized reports and exports fine-grained time-series data.<n>WattsOnAI encourages the research community to weigh environmental impact alongside raw performance of AI workloads and advances the shift toward more sustainable "Green AI" practices.
arXiv Detail & Related papers (2025-06-25T15:24:45Z) - Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [58.50944604905037]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment [12.921833067052928]
This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge.<n>We present a unified, cognition-preserving framework aimed at retaining multi-step reasoning under tight memory/compute budgets.<n>We synthesize advances in efficient Transformer design, multimodal integration, hardware-aware compilation, privacy-preserving learning, and agentic tool use, and map them to edge-specific operating envelopes.
arXiv Detail & Related papers (2025-01-04T06:17:48Z) - Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z) - Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs.
At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads.
At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z) - Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation [82.85015548989223]
Pentathlon is a benchmark for holistic and realistic evaluation of model efficiency.
Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle.
It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption.
arXiv Detail & Related papers (2023-07-19T01:05:33Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.