Related papers: How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations

How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations

URL: http://arxiv.org/abs/2510.22780v2
Date: Thu, 06 Nov 2025 21:03:13 GMT
Title: How Do AI Agents Do Human Work? Comparing AI and Human Workflows Across Diverse Occupations
Authors: Zora Zhiruo Wang, Yijia Shao, Omar Shaikh, Daniel Fried, Graham Neubig, Diyi Yang,
Abstract summary: We study how agents do human work by presenting the first direct comparison of human and agent workers.<n>We find that agents deliver results 88.3% faster and cost 90.4-96.2% less than humans.
Score: 112.57167042285437
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: AI agents are continually optimized for tasks related to human work, such as software engineering and professional writing, signaling a pressing trend with significant impacts on the human workforce. However, these agent developments have often not been grounded in a clear understanding of how humans execute work, to reveal what expertise agents possess and the roles they can play in diverse workflows. In this work, we study how agents do human work by presenting the first direct comparison of human and agent workers across multiple essential work-related skills: data analysis, engineering, computation, writing, and design. To better understand and compare heterogeneous computer-use activities of workers, we introduce a scalable toolkit to induce interpretable, structured workflows from either human or agent computer-use activities. Using such induced workflows, we compare how humans and agents perform the same tasks and find that: (1) While agents exhibit promise in their alignment to human workflows, they take an overwhelmingly programmatic approach across all work domains, even for open-ended, visually dependent tasks like design, creating a contrast with the UI-centric methods typically used by humans. (2) Agents produce work of inferior quality, yet often mask their deficiencies via data fabrication and misuse of advanced tools. (3) Nonetheless, agents deliver results 88.3% faster and cost 90.4-96.2% less than humans, highlighting the potential for enabling efficient collaboration by delegating easily programmable tasks to agents.

Related papers

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models [14.45823275027527]
Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior.<n>We first show, through a human-subjects experiment, that humans exhibit diverse coordination and communication behavior in this domain.<n>We then show that our approach can effectively replicate trends from human teaming data and also capture behaviors that are not easily observed.
arXiv Detail & Related papers (2025-04-04T23:09:40Z)
Human-AI Collaboration: Trade-offs Between Performance and Preferences [6.521033978692547]
We show that agents who are more considerate of human actions are preferred over purely performance-maximizing agents.<n>We find evidence for inequality-aversion effects being a driver of human choices, suggesting that people prefer collaborative agents which allow them to meaningfully contribute to the team.
arXiv Detail & Related papers (2025-02-28T23:50:14Z)
CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation [70.3224918173672]
CowPilot is a framework supporting autonomous as well as human-agent collaborative web navigation.<n>It reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions.<n>CowPilot can serve as a useful tool for data collection and agent evaluation across websites.
arXiv Detail & Related papers (2025-01-28T00:56:53Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [55.03911355902567]
We introduce TheAgentCompany, a benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker.<n>We find that the most competitive agent can complete 30% of tasks autonomously.<n>This paints a nuanced picture on task automation with simulating LM agents in a setting a real workplace.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
ChatCollab: Exploring Collaboration Between Humans and AI Agents in Software Teams [1.3967206132709542]
ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role.<n>Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities.<n>In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task.
arXiv Detail & Related papers (2024-12-02T21:56:46Z)
Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction [1.6574413179773757]
Large language models (LLMs) should be able to leverage their large breadth of understanding to interpret natural language commands. However, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance.
arXiv Detail & Related papers (2024-11-23T02:47:12Z)
WebArena: A Realistic Web Environment for Building Autonomous Agents [92.3291458543633]
We build an environment for language-guided agents that is highly realistic and reproducible. We focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains. We release a set of benchmark tasks focusing on evaluating the functional correctness of task completions.
arXiv Detail & Related papers (2023-07-25T22:59:32Z)
Human-Robot Team Coordination with Dynamic and Latent Human Task Proficiencies: Scheduling with Learning Curves [0.0]
We introduce a novel resource coordination that enables robots to explore the relative strengths and learning abilities of their human teammates. We generate and evaluate a robust schedule while discovering the latest individual worker proficiency. Results indicate that scheduling strategies favoring exploration tend to be beneficial for human-robot collaboration.
arXiv Detail & Related papers (2020-07-03T19:44:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.