Will It Survive? Deciphering the Fate of AI-Generated Code in Open Source
- URL: http://arxiv.org/abs/2601.16809v1
- Date: Fri, 23 Jan 2026 15:00:46 GMT
- Title: Will It Survive? Deciphering the Fate of AI-Generated Code in Open Source
- Authors: Musfiqur Rahman, Emad Shihab,
- Abstract summary: A prevailing hypothesis suggests that code is "disposable", meaning it is merged quickly but discarded shortly thereafter.<n>We investigate this hypothesis through survival analysis of 201 open-source projects, tracking over 200,000 code remediation units authored by AI agents versus humans.
- Score: 3.6525095710982924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of AI agents as coding assistants into software development has raised questions about the long-term viability of AI agent-generated code. A prevailing hypothesis within the software engineering community suggests this code is "disposable", meaning it is merged quickly but discarded shortly thereafter. If true, organizations risk shifting maintenance burden from generation to post-deployment remediation. We investigate this hypothesis through survival analysis of 201 open-source projects, tracking over 200,000 code units authored by AI agents versus humans. Contrary to the disposable code narrative, agent-authored code survives significantly longer: at the line level, it exhibits a 15.8 percentage-point lower modification rate and 16% lower hazard of modification (HR = 0.842, p < 0.001). However, modification profiles differ. Agent-authored code shows modestly elevated corrective rates (26.3% vs. 23.0%), while human code shows higher adaptive rates. However, the effect sizes are small (Cramér's V = 0.116), and per-agent variation exceeds the agent-human gap. Turning to prediction, textual features can identify modification-prone code (AUC-ROC = 0.671), but predicting when modifications occur remains challenging (Macro F1 = 0.285), suggesting timing depends on external organizational dynamics. The bottleneck for agent-generated code may not be generation quality, but the organizational practices that govern its long-term evolution.
Related papers
- AICD Bench: A Challenging Benchmark for AI-Generated Code Detection [91.21422299346199]
AICD Bench is the most comprehensive benchmark for AI-generated code detection.<n>It spans $emph2M examples$, $emph77 models$ across $emph11 families$, and $emph9 programming languages$, including recent reasoning models.
arXiv Detail & Related papers (2026-02-02T13:24:14Z) - Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z) - How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests [0.0]
We analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits)<n>Agentic PRs differ substantially from Human PRs in commit count (Cliff's $= 0.5429$) and show moderate differences in files touched and deleted lines.<n>These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.
arXiv Detail & Related papers (2026-01-24T20:27:04Z) - AI builds, We Analyze: An Empirical Study of AI-Generated Build Code Quality [0.0]
The rapid adoption of AI coding agents for software development has raised important questions about the quality and maintainability of the code they produce.<n>This data mining challenge focuses on AIDev, the first large-scale, openly available dataset capturing agent-pull requests from real-world GitHub repositories.<n>We identified 364 maintainability and security-related build smells across varying severity levels, indicating that AI-generated build code can introduce quality issues.
arXiv Detail & Related papers (2026-01-23T15:40:28Z) - AI Code in the Wild: Measuring Security Risks and Ecosystem Shifts of AI-Generated Code in Modern Software [12.708926174194199]
We present the first large-scale empirical study of AI-generated code (AIGCode) in the wild.<n>We build a high-precision detection pipeline and a benchmark to distinguish AIGCode from human-written code.<n>This lets us label commits, files, and functions along a human/AI axis and trace how AIGCode moves through projects and vulnerability life cycles.
arXiv Detail & Related papers (2025-12-21T02:26:29Z) - Agentic Refactoring: An Empirical Study of AI Coding Agents [9.698067623031909]
Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape.<n>These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks.<n>There is a critical lack of empirical understanding regarding how agentic is utilized in practice, how it compares to human-driven, and what impact it has on code quality.
arXiv Detail & Related papers (2025-11-06T21:24:38Z) - Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z) - From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z) - Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [58.419940585826744]
We introduce FairOPT, an algorithm for group-specific threshold optimization for probabilistic AI-text detectors.<n>We partitioned data into subgroups based on attributes (e.g., text length and writing style) and implemented FairOPT to learn decision thresholds for each group to reduce discrepancy.<n>Our framework paves the way for more robust classification in AI-generated content detection via post-processing.
arXiv Detail & Related papers (2025-02-06T21:58:48Z) - RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - Evaluating Software Development Agents: Patch Patterns, Code Quality, and Issue Complexity in Real-World GitHub Scenarios [13.949319911378826]
This study evaluated 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues.<n>No single agent dominated, with 170 issues unresolved, indicating room for improvement.<n>Most agents maintained code reliability and security, avoiding new bugs or vulnerabilities.<n>Some agents increased code complexity, many reduced code duplication and minimized code smells.
arXiv Detail & Related papers (2024-10-16T11:33:57Z) - Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code.
We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.