Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows
- URL: http://arxiv.org/abs/2507.08149v2
- Date: Sat, 13 Sep 2025 14:59:53 GMT
- Title: Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows
- Authors: Valerie Chen, Ameet Talwalkar, Robert Brennan, Graham Neubig,
- Abstract summary: We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
- Score: 60.04362496037186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers now have access to a growing array of increasingly autonomous AI tools for software development. While many studies examine copilots that provide chat assistance or code completions, evaluations of coding agents -- which can automatically write files and run code -- still rely on static benchmarks. We present the first controlled study of developer interactions with coding agents, characterizing how more autonomous AI tools affect productivity and experience. We evaluate two leading copilot and agentic coding assistants, recruiting participants who regularly use the former. Our results show agents can assist developers in ways that surpass copilots (e.g., completing tasks humans may not have accomplished) and reduce the effort required to finish tasks. Yet challenges remain for broader adoption, including ensuring users adequately understand agent behaviors. Our findings reveal how workflows shift with coding agents and how interactions differ from copilots, motivating recommendations for researchers and highlighting challenges in adopting agentic systems.
Related papers
- Are We All Using Agents the Same Way? An Empirical Study of Core and Peripheral Developers Use of Coding Agents [4.744786007044749]
We study how core and peripheral developers use, review, modify, and verify agent-generated contributions prior to acceptance.<n>A subset of peripheral developers use agents more often, delegating tasks evenly across bug fixing, feature addition, documentation, and testing.<n>In contrast, core developers focus more on documentation and testing, yet their agentic PRs are frequently merged into the main/master branch.
arXiv Detail & Related papers (2026-01-27T22:50:01Z) - Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z) - A Human Centric Requirements Engineering Framework for Assessing Github Copilot Output [0.0]
GitHub Copilot introduces new challenges in how these software tools address human needs.<n>I analyzed GitHub Copilot's interaction with users through its chat interface.<n>I established a human-centered requirements framework with clear metrics to evaluate these qualities.
arXiv Detail & Related papers (2025-08-05T21:33:23Z) - The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering [10.252332355171237]
This paper introduces AIDev, the first largescale dataset capturing how such agents operate in the wild.<n>Spanning over 456,000 pull requests by five leading agents, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.<n>The dataset includes rich on PRs, authorship, review timelines, code changes, and integration outcomes.
arXiv Detail & Related papers (2025-07-20T15:15:58Z) - From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z) - From Developer Pairs to AI Copilots: A Comparative Study on Knowledge Transfer [8.567835367628787]
With the rise of AI coding assistants, developers now not only work with human partners but also, as some claim, with AI pair programmers.<n>To analyze knowledge transfer in both human-human and human-AI settings, we conducted an empirical study.<n>We found a similar frequency of successful knowledge transfer episodes and overlapping topical categories across both settings.
arXiv Detail & Related papers (2025-06-05T09:13:30Z) - R&D-Agent: Automating Data-Driven AI Solution Building Through LLM-Powered Automated Research, Development, and Evolution [60.80016554091364]
R&D-Agent is a dual-agent framework for iterative exploration.<n>The Researcher agent uses performance feedback to generate ideas, while the Developer agent refines code based on error feedback.<n>R&D-Agent is evaluated on MLE-Bench and emerges as the top-performing machine learning engineering agent.
arXiv Detail & Related papers (2025-05-20T06:07:00Z) - CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation [70.3224918173672]
CowPilot is a framework supporting autonomous as well as human-agent collaborative web navigation.<n>It reduces the number of steps humans need to perform by allowing agents to propose next steps, while users are able to pause, reject, or take alternative actions.<n>CowPilot can serve as a useful tool for data collection and agent evaluation across websites.
arXiv Detail & Related papers (2025-01-28T00:56:53Z) - Towards Decoding Developer Cognition in the Age of AI Assistants [9.887133861477233]
We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools.<n>We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time.
arXiv Detail & Related papers (2025-01-05T23:25:21Z) - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We introduce TheAgentCompany, a benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker.<n>We find that the most competitive agent can complete 30% of tasks autonomously.<n>This paints a nuanced picture on task automation with simulating LM agents in a setting a real workplace.
arXiv Detail & Related papers (2024-12-18T18:55:40Z) - ChatCollab: Exploring Collaboration Between Humans and AI Agents in Software Teams [1.3967206132709542]
ChatCollab's novel architecture allows agents - human or AI - to join collaborations in any role.<n>Using software engineering as a case study, we find that our AI agents successfully identify their roles and responsibilities.<n>In relation to three prior multi-agent AI systems for software development, we find ChatCollab AI agents produce comparable or better software in an interactive game development task.
arXiv Detail & Related papers (2024-12-02T21:56:46Z) - Does Co-Development with AI Assistants Lead to More Maintainable Code? A Registered Report [6.7428644467224]
This study aims to examine the influence of AI assistants on software maintainability.
In Phase 1, developers will add a new feature to a Java project, with or without the aid of an AI assistant.
In Phase 2, a randomized controlled trial, will involve a different set of developers evolving random Phase 1 projects - working without AI assistants.
arXiv Detail & Related papers (2024-08-20T11:48:42Z) - OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer.<n>We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z) - Generative AI for Pull Request Descriptions: Adoption, Impact, and
Developer Interventions [11.620351603683496]
GitHub's Copilot for Pull Requests (PRs) is a promising service aiming to automate various developer tasks related to PRs.
In this study, we examine 18,256 PRs in which parts of the descriptions were crafted by generative AI.
Our findings indicate that Copilot for PRs, though in its infancy, is seeing a marked uptick in adoption.
arXiv Detail & Related papers (2024-02-14T06:20:57Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.