Related papers: When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests

When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests

URL: http://arxiv.org/abs/2602.19441v1
Date: Mon, 23 Feb 2026 02:20:56 GMT
Title: When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests
Authors: Costain Nachuma, Minhaz Zibran,
Abstract summary: We study integration outcomes, resolution speed, and review-time collaboration signals using the public AIDev dataset.<n>We find that reviewer engagement has the strongest correlation with successful integration, whereas larger change sizes and coordination-disrupting actions are associated with a lower likelihood of merging.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous coding agents increasingly contribute to software development by submitting pull requests on GitHub; yet, little is known about how these contributions integrate into human-driven review workflows. We present a large empirical study of agent-authored pull requests using the public AIDev dataset, examining integration outcomes, resolution speed, and review-time collaboration signals. Using logistic regression with repository-clustered standard errors, we find that reviewer engagement has the strongest correlation with successful integration, whereas larger change sizes and coordination-disrupting actions, such as force pushes, are associated with a lower likelihood of merging. In contrast, iteration intensity alone provides limited explanatory power once collaboration signals are considered. A qualitative analysis further shows that successful integration occurs when agents engage in actionable review loops that converge toward reviewer expectations. Overall, our results highlight that the effective integration of agent-authored pull requests depends not only on code quality but also on alignment with established review and coordination practices.

Related papers

How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses [6.061536429904841]
We conduct an empirical analysis of pull requests created by five AI coding agents using the AIDev dataset.<n>We find that AI coding agents exhibit distinct PR description styles, which are associated with differences in reviewer engagement, response time, and merge outcomes.
arXiv Detail & Related papers (2026-02-19T05:06:31Z)
Scaling Agentic Verifier for Competitive Coding [66.11758166379092]
Large language models (LLMs) have demonstrated strong coding capabilities but still struggle to solve competitive programming problems correctly in a single attempt.<n>Execution-based re-ranking offers a promising test-time scaling strategy, yet existing methods are constrained by either difficult test case generation or inefficient random input sampling.<n>We propose Agentic Verifier, an execution-based agent that actively reasons about program behaviors and searches for highly discriminative test inputs.
arXiv Detail & Related papers (2026-02-04T06:30:40Z)
A Task-Level Evaluation of AI Agents in Open-Source Projects [0.0]
We present a comparative study of five autonomous coding agents using AIDev-pop.<n>We evaluate agents' performance along three task-aware dimensions spanning the PR lifecycle.<n>Our findings inform selection and improvements of AI agents for their effective integration to collaborative software engineering.
arXiv Detail & Related papers (2026-02-02T17:05:19Z)
Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z)
Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub [4.409447722044799]
This study aims to characterize how autonomous coding agents contribute to software security in practice.<n>We conduct a large-scale empirical analysis of agent-authored PRs using the AIDev dataset.<n>We then analyze prevalence, acceptance outcomes, and review latency across autonomous agents, programming ecosystems, and types of code changes.
arXiv Detail & Related papers (2026-01-01T21:14:11Z)
Completion $\neq$ Collaboration: Scaling Collaborative Effort with Agents [48.95020665909723]
We argue for a shift from building and assessing task completion agents to developing collaborative agents.<n>We introduce collaborative effort scaling, a framework that captures how an agent's utility grows with increasing user involvement.
arXiv Detail & Related papers (2025-10-29T17:47:18Z)
CoDA: Agentic Systems for Collaborative Data Visualization [57.270599188947294]
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations.<n>Existing approaches, including simple single- or multi-agent systems, often oversimplify the task.<n>We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection.
arXiv Detail & Related papers (2025-10-03T17:30:16Z)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z)
On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.