Related papers: Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis

Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis

URL: http://arxiv.org/abs/2511.04481v1
Date: Thu, 06 Nov 2025 15:59:59 GMT
Title: Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis
Authors: Lars Krupp, Daniel Geißler, Vishal Banwari, Paul Lukowicz, Jakob Karolus,
Abstract summary: We show how different philosophies in web agent creation can severely impact the associated expended energy.<n>We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption.
Score: 9.631189259234931
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web agents, like OpenAI's Operator and Google's Project Mariner, are powerful agentic systems pushing the boundaries of Large Language Models (LLM). They can autonomously interact with the internet at the user's behest, such as navigating websites, filling search masks, and comparing price lists. Though web agent research is thriving, induced sustainability issues remain largely unexplored. To highlight the urgency of this issue, we provide an initial exploration of the energy and $CO_2$ cost associated with web agents from both a theoretical -via estimation- and an empirical perspective -by benchmarking. Our results show how different philosophies in web agent creation can severely impact the associated expended energy, and that more energy consumed does not necessarily equate to better results. We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. Our work contributes towards a change in thinking of how we evaluate web agents, advocating for dedicated metrics measuring energy consumption in benchmarks.

Related papers

Evaluating End-User Device Energy Models in Sustainability Reporting of Browser-Based Web Services [0.0]
Sustainability reporting in web-based services increasingly relies on simplified energy and carbon models.<n>This paper presents an empirical study evaluating how well such models reflect actual energy consumption.<n>Results show that the commonly applied constant-power approximation can diverge substantially from measured energy.
arXiv Detail & Related papers (2025-10-14T14:25:26Z)
EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments [1.0026496861838445]
EconWebArena is a benchmark for evaluating autonomous agents on complex, multimodal economic tasks in realistic web environments.<n>The benchmark comprises 360 curated tasks from 82 authoritative websites spanning domains such as macroeconomics, labor, finance, trade, and public policy.
arXiv Detail & Related papers (2025-06-09T18:39:48Z)
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [78.55946306325914]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z)
WebThinker: Empowering Large Reasoning Models with Deep Research Capability [109.8504165631888]
WebThinker is a deep research agent that empowers LRMs to autonomously search the web, navigate among web pages, and draft reports during the reasoning process.<n>It also employs an Autonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.<n>Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z)
An Illusion of Progress? Assessing the Current State of Web Agents [61.742657650092845]
We conduct a comprehensive and rigorous assessment of the current state of web agents.<n>Results depict a very different picture of the competency of current agents, suggesting over-optimism in previously reported results.<n>We introduce Online-Mind2Web, an online evaluation benchmark consisting of 300 diverse and realistic tasks spanning 136 websites.
arXiv Detail & Related papers (2025-04-02T05:51:29Z)
Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations [2.2765705959685234]
This study investigates the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines.<n>We employ software-based power measurements to ensure ease of replication across diverse configurations, models, and datasets.
arXiv Detail & Related papers (2025-03-31T10:28:04Z)
Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption [4.614707355759162]
This study explores the energy and CO2 cost associated with web agents.<n>Results show how different philosophies in web agent creation can severely impact the associated expended energy.<n>Our work advocates a change in thinking when evaluating web agents, warranting dedicated metrics for energy consumption and sustainability.
arXiv Detail & Related papers (2025-02-25T06:58:40Z)
The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.<n>We propose an extended BrowserGym-based ecosystem for web agent research, which unifies existing benchmarks from the literature.<n>We conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across 6 popular web agent benchmarks.
arXiv Detail & Related papers (2024-12-06T23:43:59Z)
Prospect Theory-inspired Automated P2P Energy Trading with Q-learning-based Dynamic Pricing [2.2463154358632473]
In this paper, we design an automated P2P energy market that takes user perception into account. We introduce a risk-sensitive Q-learning mechanism named Q-b Pricing and Risk-sensitivity (PQR), which learns the optimal price for sellers considering their perceived utility. Results based on real traces of energy consumption and production, as well as realistic prospect theory functions, show that our approach achieves a 26% higher perceived value for buyers.
arXiv Detail & Related papers (2022-08-26T16:45:40Z)
Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network. We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z)
Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied. A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem. Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.