Related papers: Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents

Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents

URL: http://arxiv.org/abs/2602.04226v1
Date: Wed, 04 Feb 2026 05:24:18 GMT
Title: Why Agentic-PRs Get Rejected: A Comparative Study of Coding Agents
Authors: Sota Nakashima, Yuta Ishimoto, Masanari Kondo, Shane Mclntosh, Yasutaka Kamei,
Abstract summary: We show that Pull Requests produced using coding agents (Agentic-PRs) are accepted less often than PRs that are not labeled as agentic (Human-PRs)<n>A large proportion of rejected PRs lack explicit feedback, making their rejection reasons difficult to determine.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic coding -- software development workflows in which autonomous coding agents plan, implement, and submit code changes with minimal human involvement -- is rapidly gaining traction. Prior work has shown that Pull Requests (PRs) produced using coding agents (Agentic-PRs) are accepted less often than PRs that are not labeled as agentic (Human-PRs). The rejection reasons for a single agent (Claude Code) have been explored, but a comparison of how rejection reasons differ between Agentic-PRs generated by different agents has not yet been performed. This comparison is important since different coding agents are often used for different purposes, which can lead to agent-specific failure patterns. In this paper, we inspect 654 rejected PRs from the AIDev dataset covering five coding agents, as well as a human baseline. Our results show that seven rejection modes occur only in Agentic-PRs, including distrust of AI-generated code. We also observe agent-specific patterns (e.g., automated withdrawal of inactive PRs by Devin), reflecting differences in how agents are configured and used in practice. Notably, a large proportion of rejected PRs (67.9%) lack explicit reviewer feedback, making their rejection reasons difficult to determine. To mitigate this issue, we propose a set of heuristics that reduce the proportion of such cases, offering a practical preprocessing step for future studies of PR rejection in agentic coding.

Related papers

AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
Beyond Bug Fixes: An Empirical Investigation of Post-Merge Code Quality Issues in Agent-Generated Pull Requests [4.744786007044749]
We analyze 1,210 merged agent-generated bug-fix PRs from Python repositories in the AIDev dataset.<n>Our results show that apparent differences in raw issue counts across agents largely disappear after normalizing by code churn.<n>Across all agents, code smells dominate, particularly at critical and major severities, while bugs are less frequent but often severe.
arXiv Detail & Related papers (2026-01-27T22:55:05Z)
How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests [0.0]
We analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits)<n>Agentic PRs differ substantially from Human PRs in commit count (Cliff's $= 0.5429$) and show moderate differences in files touched and deleted lines.<n>These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.
arXiv Detail & Related papers (2026-01-24T20:27:04Z)
Where Do AI Coding Agents Fail? An Empirical Study of Failed Agentic Pull Requests in GitHub [5.808464460707249]
We conduct a large-scale study of 33k agent-authored PRs made by five coding agents across GitHub.<n>We first quantitatively characterize merged and not-merged PRs along four broad dimensions.<n>Not-merged PRs tend to involve larger code changes, touch more files, and often do not pass the project's CI/CD pipeline validation.
arXiv Detail & Related papers (2026-01-21T17:12:46Z)
Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub [4.409447722044799]
This study aims to characterize how autonomous coding agents contribute to software security in practice.<n>We conduct a large-scale empirical analysis of agent-authored PRs using the AIDev dataset.<n>We then analyze prevalence, acceptance outcomes, and review latency across autonomous agents, programming ecosystems, and types of code changes.
arXiv Detail & Related papers (2026-01-01T21:14:11Z)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation [87.47155146067962]
We provide a standardized evaluation harness that orchestrates parallel evaluations across hundreds of tasks.<n>We conduct three-dimensional analysis spanning models, scaffolds, and benchmarks.<n>Our analysis reveals surprising insights, such as higher reasoning effort reducing accuracy in the majority of runs.
arXiv Detail & Related papers (2025-10-13T22:22:28Z)
OAgents: An Empirical Study of Building Effective Agents [46.50371876218872]
We study the impact of popular design choices in key agent components in a fair and rigorous manner.<n>Based on our findings, we build and open-source OAgents, a new foundation agent framework.
arXiv Detail & Related papers (2025-06-17T17:59:02Z)
Towards Adaptive Software Agents for Debugging [0.40964539027092917]
We propose an adaptive agentic design, where the number of agents and their roles are determined dynamically.<n>Our initial evaluation shows that, with the adaptive design, the number of agents that are generated depends on the complexity of the buggy code.<n> Regarding the effectiveness of the fix, we noticed an average improvement of 11% compared to the one-shot prompting.
arXiv Detail & Related papers (2025-04-25T12:48:08Z)
When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements [56.29265568399648]
We argue that disagreements prevent premature consensus and expand the explored solution space.<n>Disagreements on task-critical steps can derail collaboration depending on the topology of solution paths.
arXiv Detail & Related papers (2025-02-21T02:24:43Z)
Explaining Reinforcement Learning Policies through Counterfactual Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time. Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z)
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.