Let's Make Every Pull Request Meaningful: An Empirical Analysis of Developer and Agentic Pull Requests
- URL: http://arxiv.org/abs/2601.18749v1
- Date: Mon, 26 Jan 2026 18:16:10 GMT
- Title: Let's Make Every Pull Request Meaningful: An Empirical Analysis of Developer and Agentic Pull Requests
- Authors: Haruhiko Yoshioka, Takahiro Monno, Haruka Tokumasu, Taiki Wakamatsu, Yuki Ota, Nimmi Weeraddana, Kenichi Matsumoto,
- Abstract summary: We conduct a large-scale empirical analysis of 40,214 PRs collected from the AIDev dataset.<n>We extract 64 features across six families and fit statistical regression models to compare PR merge outcomes for human and agentic PRs.<n>Our results show that submitter attributes dominate merge outcomes for both groups, while review-related features exhibit contrasting effects between human and agentic PRs.
- Score: 0.944838645453772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The automatic generation of pull requests (PRs) using AI agents has become increasingly common. Although AI-generated PRs are fast and easy to create, their merge rates have been reported to be lower than those created by humans. In this study, we conduct a large-scale empirical analysis of 40,214 PRs collected from the AIDev dataset. We extract 64 features across six families and fit statistical regression models to compare PR merge outcomes for human and agentic PRs, as well as across three AI agents. Our results show that submitter attributes dominate merge outcomes for both groups, while review-related features exhibit contrasting effects between human and agentic PRs. The findings of this study provide insights into improving PR quality through human-AI collaboration.
Related papers
- AgentIR: Reasoning-Aware Retrieval for Deep Research Agents [76.29382561831105]
Deep Research agents generate explicit natural language reasoning before each search call.<n> Reasoning-Aware Retrieval embeds the agent's reasoning trace alongside its query.<n>DR- Synth generates Deep Research retriever training data from standard QA datasets.<n>AgentIR-4B achieves 68% accuracy with the open-weight agent Tongyi-DeepResearch.
arXiv Detail & Related papers (2026-03-04T18:47:26Z) - How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses [6.061536429904841]
We conduct an empirical analysis of pull requests created by five AI coding agents using the AIDev dataset.<n>We find that AI coding agents exhibit distinct PR description styles, which are associated with differences in reviewer engagement, response time, and merge outcomes.
arXiv Detail & Related papers (2026-02-19T05:06:31Z) - A Task-Level Evaluation of AI Agents in Open-Source Projects [0.0]
We present a comparative study of five autonomous coding agents using AIDev-pop.<n>We evaluate agents' performance along three task-aware dimensions spanning the PR lifecycle.<n>Our findings inform selection and improvements of AI agents for their effective integration to collaborative software engineering.
arXiv Detail & Related papers (2026-02-02T17:05:19Z) - Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z) - Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub [5.808464460707249]
We conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub.<n>We first quantitatively characterize merged and not-merged PRs along four broad dimensions.<n>Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project's CI/CD pipeline validation.
arXiv Detail & Related papers (2026-01-21T17:12:46Z) - On Autopilot? An Empirical Study of Human-AI Teaming and Review Practices in Open Source [11.412808537439973]
We investigated project-level guidelines and developers' interactions with AI-assisted pull requests (PRs)<n>We found that over 67.5% of AI-co-authored PRs originate from contributors without prior code ownership.<n>In contrast to human-created PRs where non-owner developers receive the most feedback, AI-co-authored PRs from non-owners receive the least.
arXiv Detail & Related papers (2026-01-20T09:09:53Z) - Early-Stage Prediction of Review Effort in AI-Generated Pull Requests [0.0]
We analyze 33,707 agent-authored PRs from the AIDev dataset across 2,807 repositories.<n>We propose a Circuit Breaker triage model that predicts high-review-effort PRs at creation time.
arXiv Detail & Related papers (2026-01-02T17:18:01Z) - How can we assess human-agent interactions? Case studies in software agent design [52.953425368394306]
We make two major steps towards the rigorous assessment of human-agent interactions.<n>We propose PULSE, a framework for more efficient human-centric evaluation of agent designs.<n>We deploy the framework on a large-scale web platform built around the open-source software agent OpenHands.
arXiv Detail & Related papers (2025-10-10T19:04:28Z) - AutoPR: Let's Automate Your Academic Promotion! [50.929742814819036]
We introduce Automatic Promotion (AutoPR), a novel task that transforms research papers into accurate, engaging, and timely public content.<n>PRAgent is a multi-agent framework that automates AutoPR in three stages: content extraction, collaborative synthesis, and platform-specific adaptation to optimize norms, tone, and tagging for maximum reach.<n>Our results position AutoPR as a tractable, measurable research problem and provide a roadmap for scalable, impactful automated scholarly communication.
arXiv Detail & Related papers (2025-10-10T17:08:36Z) - LIMI: Less is More for Agency [49.63355240818081]
LIMI (Less Is More for Intelligent Agency) demonstrates that agency follows radically different development principles.<n>We show that sophisticated agentic intelligence can emerge from minimal but strategically curated demonstrations of autonomous behavior.<n>Our findings establish the Agency Efficiency Principle: machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.
arXiv Detail & Related papers (2025-09-22T10:59:32Z) - Graphs Meet AI Agents: Taxonomy, Progress, and Future Opportunities [117.49715661395294]
Data structurization can play a promising role by transforming intricate and disorganized data into well-structured forms.<n>This survey presents a first systematic review of how graphs can empower AI agents.
arXiv Detail & Related papers (2025-06-22T12:59:12Z) - Designing AI-Agents with Personalities: A Psychometric Approach [2.854338743097065]
We introduce a methodology for assigning quantifiable and psychometrically validated personalities to AI-Agents.<n>Across three studies, we evaluate its feasibility and limitations.
arXiv Detail & Related papers (2024-10-25T01:05:04Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - ADVISE: AI-accelerated Design of Evidence Synthesis for Global
Development [2.6293574825904624]
This study develops an AI agent based on a bidirectional encoder representations from transformers (BERT) model.
We explore the effectiveness of the human-AI hybrid team in accelerating the evidence synthesis process.
Results show that incorporating the BERT-based AI agent into the human team can reduce the human screening effort by 68.5%.
arXiv Detail & Related papers (2023-05-02T01:29:53Z) - Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally
Inattentive Reinforcement Learning [85.86440477005523]
We study more human-like RL agents which incorporate an established model of human-irrationality, the Rational Inattention (RI) model.
RIRL models the cost of cognitive information processing using mutual information.
We show that using RIRL yields a rich spectrum of new equilibrium behaviors that differ from those found under rational assumptions.
arXiv Detail & Related papers (2022-01-18T20:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.