When AI Agents Touch CI/CD Configurations: Frequency and Success
- URL: http://arxiv.org/abs/2601.17413v1
- Date: Sat, 24 Jan 2026 11:14:22 GMT
- Title: When AI Agents Touch CI/CD Configurations: Frequency and Success
- Authors: Taher A. Ghaleb,
- Abstract summary: We analyze 8,031 agentic pull requests (PRs) from 1,605 GitHub repositories where AI agents touch YAML.<n>When agents modify CI/CD, 96.77% target GitHub Actions.<n>These results show that AI agents rarely modify CI/CD and focus mostly on GitHub Actions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI agents are increasingly used in software development, yet their interaction with CI/CD configurations is not well studied. We analyze 8,031 agentic pull requests (PRs) from 1,605 GitHub repositories where AI agents touch YAML configurations. CI/CD configuration files account for 3.25% of agent changes, varying by agent (Devin: 4.83%, Codex: 2.01%, p < 0.001). When agents modify CI/CD, 96.77% target GitHub Actions. Agentic PRs with CI/CD changes merge slightly less often than others (67.77% vs. 71.80%), except for Copilot, whose CI/CD changes merge 15.63 percentage points more often. Across 99,930 workflow runs, build success rates are comparable for CI/CD and non-CI/CD changes (75.59% vs. 74.87%), though three agents show significantly higher success when modifying CI/CD. These results show that AI agents rarely modify CI/CD and focus mostly on GitHub Actions, yet their configuration changes are as reliable as regular code. Copilot's strong CI/CD performance despite lower acceptance suggests emerging configuration specialization, with implications for agent training and DevOps automation.
Related papers
- Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance [4.424336158797069]
This paper compares five popular AI-powered coding assistants (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code)<n>Devin exhibits the only consistent positive trend in acceptance rate (+0.77% per week over 32 weeks)<n>Our analysis suggests that the PR task type is a dominant factor influencing acceptance rates.
arXiv Detail & Related papers (2026-02-09T17:14:46Z) - Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z) - RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository [52.98970048197381]
RepoGenesis is the first multilingual benchmark for repository-level end-to-end web microservice generation.<n>It consists of 106 repositories (60 Python, 46 Java) across 18 domains and 11 frameworks, with 1,258 API endpoints and 2,335 test cases verified.<n>Results reveal that despite high AC (up to 73.91%) and DSR (up to 100%), the best-performing system achieves only 23.67% Pass@1 on Python and 21.45% on Java.
arXiv Detail & Related papers (2026-01-20T13:19:20Z) - Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z) - OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability [49.99934595922838]
Reliability is key to realizing the promise of autonomous UI-Agents.<n>We develop OpenApps, a light-weight open-source ecosystem with six apps.<n>We run more than 10,000 independent evaluations to study reliability across seven leading multimodal agents.
arXiv Detail & Related papers (2025-11-25T19:00:22Z) - Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z) - Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z) - On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub [6.7302091035327285]
Large language models (LLMs) are increasingly being integrated into software development processes.<n>The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice.<n>We empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 open-source projects.
arXiv Detail & Related papers (2025-09-18T08:48:32Z) - Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z) - CI/CD Configuration Practices in Open-Source Android Apps: An Empirical Study [0.1433758865948252]
Continuous Integration and Continuous Delivery (CI/CD) is a well-established practice that automatically builds, tests, packages, and deploys software systems.<n>Mobile apps have distinct characteristics with respect to CI/CD practices, such as testing on various emulators and deploying to app stores.<n>We conduct an empirical study on CI/CD practices in 2,557 Android apps adopting four popular CI/CD services.
arXiv Detail & Related papers (2024-11-09T05:46:43Z) - Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects [1.181206257787103]
This work presents the first empirical analysis of how continuous integration and delivery (CI/CD) configuration evolves for machine learning (ML) software systems.<n>We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories.<n>We developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits.
arXiv Detail & Related papers (2024-03-18T19:14:38Z) - A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains.
Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration.
We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.