Related papers: On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub

On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub

URL: http://arxiv.org/abs/2509.14745v1
Date: Thu, 18 Sep 2025 08:48:32 GMT
Title: On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub
Authors: Miku Watanabe, Hao Li, Yutaro Kashiwa, Brittany Reid, Hajimu Iida, Ahmed E. Hassan,
Abstract summary: Large language models (LLMs) are increasingly being integrated into software development processes.<n>The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice.<n>We empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 open-source projects.
Score: 6.7302091035327285
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly being integrated into software development processes. The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice. However, little is known about the practical usefulness of these pull requests and the extent to which their contributions are accepted in real-world projects. In this paper, we empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 diverse open-source projects. Our analysis reveals that developers tend to rely on agents for tasks such as refactoring, documentation, and testing. The results indicate that 83.8% of these agent-assisted PRs are eventually accepted and merged by project maintainers, with 54.9% of the merged PRs are integrated without further modification. The remaining 45.1% require additional changes benefit from human revisions, especially for bug fixes, documentation, and adherence to project-specific standards. These findings suggest that while agent-assisted PRs are largely acceptable, they still benefit from human oversight and refinement.

Related papers

Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z)
Agentic Much? Adoption of Coding Agents on GitHub [6.395990525268647]
We present the first large-scale study of the adoption of coding agents on GitHub.<n>We find an estimated adoption rate of 15.85%--22.60%, which is very high for a technology only a few months old--and increasing.<n>At the commit level, we find that commits assisted by coding agents are larger than commits only authored by human developers.
arXiv Detail & Related papers (2026-01-26T10:28:10Z)
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests [0.0]
We analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits)<n>Agentic PRs differ substantially from Human PRs in commit count (Cliff's $= 0.5429$) and show moderate differences in files touched and deleted lines.<n>These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.
arXiv Detail & Related papers (2026-01-24T20:27:04Z)
Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub [5.808464460707249]
We conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub.<n>We first quantitatively characterize merged and not-merged PRs along four broad dimensions.<n>Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project's CI/CD pipeline validation.
arXiv Detail & Related papers (2026-01-21T17:12:46Z)
On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source [11.412808537439973]
We investigated project-level guidelines and developers' interactions with AI-assisted pull requests (PRs)<n>We found that over 67.5% of AI-co-authored PRs originate from contributors without prior code ownership.<n>In contrast to human-created PRs where non-owner developers receive the most feedback, AI-co-authored PRs from non-owners receive the least.
arXiv Detail & Related papers (2026-01-20T09:09:53Z)
TOM-SWE: User Mental Modeling For Software Engineering Agents [75.28749912645127]
ToM-SWE is a dual-agent architecture that pairs a primary software-engineering (SWE) agent with a lightweight theory-of-mind (ToM) partner agent.<n>ToM-SWE infers user goals, constraints, and preferences from instructions and interaction history.<n>In two software engineering benchmarks, ToM-SWE improves task success rates and user satisfaction.
arXiv Detail & Related papers (2025-10-24T16:09:51Z)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
How can we assess human-agent interactions? Case studies in software agent design [52.953425368394306]
We make two major steps towards the rigorous assessment of human-agent interactions.<n>We propose PULSE, a framework for more efficient human-centric evaluation of agent designs.<n>We deploy the framework on a large-scale web platform built around the open-source software agent OpenHands.
arXiv Detail & Related papers (2025-10-10T19:04:28Z)
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans [46.56091965723774]
Fine-tuning large language models for code editing has typically relied on mining commits and pull requests.<n>We present AgentPack, a corpus of 1.3M code edits co-authored by Claude Code, OpenAI Codex, and Cursor Agent.<n>We show that models fine-tuned on AgentPack can outperform models trained on prior human-only commit corpora.
arXiv Detail & Related papers (2025-09-26T05:28:22Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes [0.0]
We analyze pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage.<n>We introduce PatchTrack, a tool that classifies whether ChatGPT patches were applied, not applied, or not suggested.<n>A qualitative analysis of 89 pull requests with integrated patches revealed recurring patterns of structural integration, selective extraction, and iterative refinement.
arXiv Detail & Related papers (2025-05-12T16:09:33Z)
Unveiling Pitfalls: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution [22.03052751722933]
Python execution errors during the issue resolution phase correlate with lower resolution rates and increased reasoning overheads.<n>We have identified the most prevalent errors -- such as ModuleNotFoundError and TypeError -- and highlighted particularly challenging errors like OSError and database-related issues.
arXiv Detail & Related papers (2025-03-16T06:24:51Z)
Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios [13.949319911378826]
This study evaluated 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues.<n>No single agent dominated, with 170 issues unresolved, indicating room for improvement.<n>Most agents maintained code reliability and security, avoiding new bugs or vulnerabilities.<n>Some agents increased code complexity, many reduced code duplication and minimized code smells.
arXiv Detail & Related papers (2024-10-16T11:33:57Z)
Towards Exception Safety Code Generation with Intermediate Representation Agents Framework [54.03528377384397]
Large Language Models (LLMs) often struggle with robust exception handling in generated code, leading to fragile programs that are prone to runtime errors.<n>We propose Seeker, a novel multi-agent framework that enforces exception safety in LLM generated code through an Intermediate Representation (IR) approach.<n>Seeker decomposes exception handling into five specialized agents: Scanner, Detector, Predator, Ranker, and Handler.
arXiv Detail & Related papers (2024-10-09T14:45:45Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions [11.620351603683496]
GitHub's Copilot for Pull Requests (PRs) is a promising service aiming to automate various developer tasks related to PRs. In this study, we examine 18,256 PRs in which parts of the descriptions were crafted by generative AI. Our findings indicate that Copilot for PRs, though in its infancy, is seeing a marked uptick in adoption.
arXiv Detail & Related papers (2024-02-14T06:20:57Z)
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub [79.31134731122462]
We introduce OpenAct benchmark to evaluate the open-domain task-solving capability, built on human expert consultation and repositories in GitHub.<n>We present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.