Related papers: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development

URL: http://arxiv.org/abs/2601.13597v1
Date: Tue, 20 Jan 2026 04:51:56 GMT
Title: AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development
Authors: Shyam Agarwal, Hao He, Bogdan Vasilescu,
Abstract summary: Large language model (LLM)-based coding agents increasingly act as autonomous contributors that generate and merge pull requests.<n>We present a longitudinal causal study of agent adoption in open-source repositories using staggered difference-in-differences with matched controls.
Score: 12.50615284537175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM)-based coding agents increasingly act as autonomous contributors that generate and merge pull requests, yet their real-world effects on software projects are unclear, especially relative to widely adopted IDE-based AI assistants. We present a longitudinal causal study of agent adoption in open-source repositories using staggered difference-in-differences with matched controls. Using the AIDev dataset, we define adoption as the first agent-generated pull request and analyze monthly repository-level outcomes spanning development velocity (commits, lines added) and software quality (static-analysis warnings, cognitive complexity, duplication, and comment density). Results show large, front-loaded velocity gains only when agents are the first observable AI tool in a project; repositories with prior AI IDE usage experience minimal or short-lived throughput benefits. In contrast, quality risks are persistent across settings, with static-analysis warnings and cognitive complexity rising roughly 18% and 35%, indicating sustained agent-induced complexity debt even when velocity advantages fade. These heterogeneous effects suggest diminishing returns to AI assistance and highlight the need for quality safeguards, provenance tracking, and selective deployment of autonomous agents. Our findings establish an empirical basis for understanding how agentic and IDE-based tools interact, and motivate research on balancing acceleration with maintainability in AI-integrated development workflows.

Related papers

Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z)
Developers in the Age of AI: Adoption, Policy, and Diffusion of AI Software Engineering Tools [0.0]
We study the usage patterns of 147 professional developers.<n>We find no perceptual support for the Quality Paradox.<n>Security concerns remain a moderate and statistically significant barrier to adoption.
arXiv Detail & Related papers (2026-01-29T05:56:35Z)
Early-Stage Prediction of Review Effort in AI-Generated Pull Requests [0.0]
We analyze 33,707 agent-authored PRs from the AIDev dataset across 2,807 repositories.<n>We propose a Circuit Breaker triage model that predicts high-review-effort PRs at creation time.
arXiv Detail & Related papers (2026-01-02T17:18:01Z)
Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub [4.409447722044799]
This study aims to characterize how autonomous coding agents contribute to software security in practice.<n>We conduct a large-scale empirical analysis of agent-authored PRs using the AIDev dataset.<n>We then analyze prevalence, acceptance outcomes, and review latency across autonomous agents, programming ecosystems, and types of code changes.
arXiv Detail & Related papers (2026-01-01T21:14:11Z)
Toward Training Superintelligent Software Agents through Self-Play SWE-RL [66.11447353341926]
Self-play SWE-RL is a first step toward training paradigms for superintelligent software agents.<n>Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies.<n>Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories.
arXiv Detail & Related papers (2025-12-21T00:49:40Z)
The Evolution of Agentic AI in Cybersecurity: From Single LLM Reasoners to Multi-Agent Systems and Autonomous Pipelines [0.0]
Cybersecurity has become one of the earliest adopters of agentic AI.<n>This survey presents a five-generation taxonomy of agentic AI in cybersecurity.
arXiv Detail & Related papers (2025-12-07T05:10:16Z)
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning [84.70211451226835]
Large Language Model (LLM) Agents are constrained by a dependency on human-curated data.<n>We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data.<n>Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks.
arXiv Detail & Related papers (2025-11-20T05:01:57Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants [21.35387344588118]
ASTRA is an automated system designed to uncover safety flaws in AI-driven code generation and security guidance systems.<n>ASTRA finds 11-66% more issues than existing techniques and produces test cases that lead to 17% more effective alignment training.
arXiv Detail & Related papers (2025-08-05T21:57:52Z)
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training [67.895981259683]
General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence.<n>Current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools.<n>We present Cognitive Kernel-Pro, a fully open-source and (to the maximum extent) free multi-module agent framework.
arXiv Detail & Related papers (2025-08-01T08:11:31Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
Towards Pervasive Distributed Agentic Generative AI -- A State of The Art [0.0]
The rapid advancement of intelligent agents and Large Language Models (LLMs) is reshaping the pervasive computing field.<n>This survey outlines the architectural components of LLM agents and examines their deployment and evaluation across various scenarios.<n>It highlights state-of-the-art agent deployment strategies and applications, including local and distributed execution on resource-constrained devices.
arXiv Detail & Related papers (2025-06-16T10:15:06Z)
The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective [3.0868637098088403]
Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning.<n>This paper presents the first comprehensive system-level analysis of AI agents, quantifying their resource usage, latency behavior, energy consumption, and test-time scaling strategies.<n>Our findings reveal that while agents improve accuracy with increased compute, they suffer from rapidly diminishing returns, widening latency variance, and unsustainable infrastructure costs.
arXiv Detail & Related papers (2025-06-04T14:37:54Z)
Information Retrieval Induced Safety Degradation in AI Agents [52.15553901577888]
This study investigates how expanding retrieval access affects model reliability, bias propagation, and harmful content generation.<n>Retrieval-enabled agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval.<n>These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-enabled and increasingly autonomous AI systems.
arXiv Detail & Related papers (2025-05-20T11:21:40Z)
AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents [15.802600809497097]
This paper introduces AI2Agent, an end-to-end framework that automates AI project deployment through guideline-driven execution.<n>We conducted experiments on 30 AI deployment cases, covering TTS, text-to-image generation, image editing, and other AI applications.<n>Results show that AI2Agent significantly reduces deployment time and improves success rates.
arXiv Detail & Related papers (2025-03-31T10:58:34Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.