Related papers: How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

URL: http://arxiv.org/abs/2601.17581v2
Date: Tue, 27 Jan 2026 06:14:36 GMT
Title: How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests
Authors: Daniel Ogenrwot, John Businge,
Abstract summary: We analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits)<n>Agentic PRs differ substantially from Human PRs in commit count (Cliff's $= 0.5429$) and show moderate differences in files touched and deleted lines.<n>These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI coding agents are increasingly acting as autonomous contributors by generating and submitting pull requests (PRs). However, we lack empirical evidence on how these agent-generated PRs differ from human contributions, particularly in how they modify code and describe their changes. Understanding these differences is essential for assessing their reliability and impact on development workflows. Using the MSR 2026 Mining Challenge version of the AIDev dataset, we analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits). We examine additions, deletions, commits, and files touched, and evaluate the consistency between PR descriptions and their diffs using lexical and semantic similarity. Agentic PRs differ substantially from Human PRs in commit count (Cliff's $δ= 0.5429$) and show moderate differences in files touched and deleted lines. They also exhibit slightly higher description-to-diff similarity across all measures. These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.

Related papers

How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses [6.061536429904841]
We conduct an empirical analysis of pull requests created by five AI coding agents using the AIDev dataset.<n>We find that AI coding agents exhibit distinct PR description styles, which are associated with differences in reviewer engagement, response time, and merge outcomes.
arXiv Detail & Related papers (2026-02-19T05:06:31Z)
Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z)
Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests [0.0]
We analyze 33,596 agent-generated PRs and 6,618 human PRs to compare code-change characteristics and message quality.<n>Agents generate stronger commit-level messages but lag humans at PR-level summarization.<n>These findings highlight a gap between agents' micro-level precision and macro-level communication.
arXiv Detail & Related papers (2026-01-24T23:33:07Z)
Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub [5.808464460707249]
We conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub.<n>We first quantitatively characterize merged and not-merged PRs along four broad dimensions.<n>Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project's CI/CD pipeline validation.
arXiv Detail & Related papers (2026-01-21T17:12:46Z)
On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source [11.412808537439973]
We investigated project-level guidelines and developers' interactions with AI-assisted pull requests (PRs)<n>We found that over 67.5% of AI-co-authored PRs originate from contributors without prior code ownership.<n>In contrast to human-created PRs where non-owner developers receive the most feedback, AI-co-authored PRs from non-owners receive the least.
arXiv Detail & Related papers (2026-01-20T09:09:53Z)
Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests [5.885226503818935]
Pull request descriptions generated by AI coding agents are the primary channel for communicating code changes to human reviewers.<n>We analyzed 23,247 agentic PRs across five agents using PR message-code inconsistency (PR-MCI)<n>High-MCI PRs had 51.7% lower acceptance rates and took 3.5x longer to merge.
arXiv Detail & Related papers (2026-01-08T12:31:02Z)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
Every Step Counts: Decoding Trajectories as Authorship Fingerprints of dLLMs [63.82840470917859]
We show that the decoding mechanism of dLLMs can be used as a powerful tool for model attribution.<n>We propose a novel information extraction scheme called the Directed Decoding Map (DDM), which captures structural relationships between decoding steps and better reveals model-specific behaviors.
arXiv Detail & Related papers (2025-10-02T06:25:10Z)
DRBench: A Realistic Benchmark for Enterprise Deep Research [81.49694432639406]
DRBench is a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings.<n>We release 15 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance.
arXiv Detail & Related papers (2025-09-30T18:47:20Z)
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans [46.56091965723774]
Fine-tuning large language models for code editing has typically relied on mining commits and pull requests.<n>We present AgentPack, a corpus of 1.3M code edits co-authored by Claude Code, OpenAI Codex, and Cursor Agent.<n>We show that models fine-tuned on AgentPack can outperform models trained on prior human-only commit corpora.
arXiv Detail & Related papers (2025-09-26T05:28:22Z)
On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub [6.7302091035327285]
Large language models (LLMs) are increasingly being integrated into software development processes.<n>The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice.<n>We empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 open-source projects.
arXiv Detail & Related papers (2025-09-18T08:48:32Z)
R&D-Agent: An LLM-Agent Framework Towards Autonomous Data Science [70.1638335489284]
High-level machine learning engineering tasks remain labor-intensive and iterative.<n>We introduce R&D-Agent, a comprehensive, decoupled, and framework that formalizes the machine learning process.<n>R&D-Agent defines the MLE into two phases and six components, turning agent design for MLE into a principled, testable process.
arXiv Detail & Related papers (2025-05-20T06:07:00Z)
When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements [56.29265568399648]
We argue that disagreements prevent premature consensus and expand the explored solution space.<n>Disagreements on task-critical steps can derail collaboration depending on the topology of solution paths.
arXiv Detail & Related papers (2025-02-21T02:24:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.