AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models
- URL: http://arxiv.org/abs/2602.16042v2
- Date: Mon, 23 Feb 2026 22:56:04 GMT
- Title: AI-CARE: Carbon-Aware Reporting Evaluation Metric for AI Models
- Authors: KC Santosh, Srikanth Baride, Rodrigue Rizk,
- Abstract summary: We propose AI-CARE, an evaluation tool for reporting energy consumption, and carbon emissions of machine learning models.<n>We demonstrate, through theoretical analysis and empirical validation, that carbon-aware benchmarking changes the relative ranking of models.<n>Our proposal aims to shift the research community toward transparent, multi-objective evaluation and align ML progress with global sustainability goals.
- Score: 2.7946918847372277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As machine learning (ML) continues its rapid expansion, the environmental cost of model training and inference has become a critical societal concern. Existing benchmarks overwhelmingly focus on standard performance metrics such as accuracy, BLEU, or mAP, while largely ignoring energy consumption and carbon emissions. This single-objective evaluation paradigm is increasingly misaligned with the practical requirements of large-scale deployment, particularly in energy-constrained environments such as mobile devices, developing regions, and climate-aware enterprises. In this paper, we propose AI-CARE, an evaluation tool for reporting energy consumption, and carbon emissions of ML models. In addition, we introduce the carbon-performance tradeoff curve, an interpretable tool that visualizes the Pareto frontier between performance and carbon cost. We demonstrate, through theoretical analysis and empirical validation on representative ML workloads, that carbon-aware benchmarking changes the relative ranking of models and encourages architectures that are simultaneously accurate and environmentally responsible. Our proposal aims to shift the research community toward transparent, multi-objective evaluation and align ML progress with global sustainability goals. The tool and documentation are available at https://github.com/USD-AI-ResearchLab/ai-care.
Related papers
- ML-EcoLyzer: Quantifying the Environmental Cost of Machine Learning Inference Across Frameworks and Hardware [0.0]
We present ML-EcoLyzer, a tool for measuring the carbon, energy, thermal, and water costs of machine learning inference.<n>The tool supports both classical and modern models, applying adaptive monitoring and hardware-aware evaluation.
arXiv Detail & Related papers (2025-11-10T04:30:29Z) - Metrics and evaluations for computational and sustainable AI efficiency [26.52588349722099]
Current approaches fail to provide a holistic view, making it difficult to compare and optimise systems.<n>We propose a unified and reproducible methodology for AI model inference that integrates computational and environmental metrics.<n>Our framework provides pragmatic, carbon-aware evaluation by systematically measuring latency and distributions throughput, energy consumption, and location-adjusted carbon emissions.
arXiv Detail & Related papers (2025-10-18T03:30:15Z) - AIMeter: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads [7.7878942091873755]
AIMeter is a comprehensive software toolkit for the measurement, analysis, and visualization of energy use, power draw, hardware performance, and carbon emissions across AI workloads.<n>By seamlessly integrating with existing AI frameworks, AIMeter offers standardized reports and exports fine-grained time-series data.<n>It further enables in-depth correlation analysis between hardware metrics and model performance and thus facilitates bottleneck identification and performance enhancement.
arXiv Detail & Related papers (2025-06-25T15:24:45Z) - Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon & Energy estimation for LLMs [8.377809633825196]
We discuss the challenges of current approaches and present our evolving framework, R-ICE, which estimates prompt level inference carbon emissions.<n>Our promising validation results suggest that benchmark-based modelling holds great potential for inference emission estimation.
arXiv Detail & Related papers (2025-06-10T12:23:02Z) - Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View [2.5832043241251337]
FUEL is a framework for evaluating the environmental impact of large language models (LLMs)<n>We uncover key insights and trade-offs in reducing carbon emissions by optimizing model size, quantization strategy, and hardware choice.
arXiv Detail & Related papers (2025-02-16T20:20:18Z) - CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs [0.0]
This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs)<n>To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index)<n>Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions.
arXiv Detail & Related papers (2024-12-03T17:32:47Z) - Power Hungry Processing: Watts Driving the Cost of AI Deployment? [74.19749699665216]
generative, multi-purpose AI systems promise a unified approach to building machine learning (ML) models into technology.
This ambition of generality'' comes at a steep cost to the environment, given the amount of energy these systems require and the amount of carbon that they emit.
We measure deployment cost as the amount of energy and carbon required to perform 1,000 inferences on representative benchmark dataset using these models.
We conclude with a discussion around the current trend of deploying multi-purpose generative ML systems, and caution that their utility should be more intentionally weighed against increased costs in terms of energy and emissions
arXiv Detail & Related papers (2023-11-28T15:09:36Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Counting Carbon: A Survey of Factors Influencing the Emissions of
Machine Learning [77.62876532784759]
Machine learning (ML) requires using energy to carry out computations during the model training process.
The generation of this energy comes with an environmental cost in terms of greenhouse gas emissions, depending on quantity used and the energy source.
We present a survey of the carbon emissions of 95 ML models across time and different tasks in natural language processing and computer vision.
arXiv Detail & Related papers (2023-02-16T18:35:00Z) - Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language
Model [72.65502770895417]
We quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle.
We estimate that BLOOM's final training emitted approximately 24.7 tonnes ofcarboneqif we consider only the dynamic power consumption.
We conclude with a discussion regarding the difficulty of precisely estimating the carbon footprint of machine learning models.
arXiv Detail & Related papers (2022-11-03T17:13:48Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.