Related papers: When AI Agents Touch CI/CD Configurations: Frequency and Success

When AI Agents Touch CI/CD Configurations: Frequency and Success

URL: http://arxiv.org/abs/2601.17413v1
Date: Sat, 24 Jan 2026 11:14:22 GMT
Title: When AI Agents Touch CI/CD Configurations: Frequency and Success
Authors: Taher A. Ghaleb,
Abstract summary: We analyze 8,031 agentic pull requests (PRs) from 1,605 GitHub repositories where AI agents touch YAML.<n>When agents modify CI/CD, 96.77% target GitHub Actions.<n>These results show that AI agents rarely modify CI/CD and focus mostly on GitHub Actions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AI agents are increasingly used in software development, yet their interaction with CI/CD configurations is not well studied. We analyze 8,031 agentic pull requests (PRs) from 1,605 GitHub repositories where AI agents touch YAML configurations. CI/CD configuration files account for 3.25% of agent changes, varying by agent (Devin: 4.83%, Codex: 2.01%, p < 0.001). When agents modify CI/CD, 96.77% target GitHub Actions. Agentic PRs with CI/CD changes merge slightly less often than others (67.77% vs. 71.80%), except for Copilot, whose CI/CD changes merge 15.63 percentage points more often. Across 99,930 workflow runs, build success rates are comparable for CI/CD and non-CI/CD changes (75.59% vs. 74.87%), though three agents show significantly higher success when modifying CI/CD. These results show that AI agents rarely modify CI/CD and focus mostly on GitHub Actions, yet their configuration changes are as reliable as regular code. Copilot's strong CI/CD performance despite lower acceptance suggests emerging configuration specialization, with implications for agent training and DevOps automation.

Related papers

Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance [4.424336158797069]
This paper compares five popular AI-powered coding assistants (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code)<n>Devin exhibits the only consistent positive trend in acceptance rate (+0.77% per week over 32 weeks)<n>Our analysis suggests that the PR task type is a dominant factor influencing acceptance rates.
arXiv Detail & Related papers (2026-02-09T17:14:46Z)
Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study [5.127121704630949]
We analyze 8,106 fix related PRs authored by five widely used AI coding agents from the AIDEV POP dataset.<n>Our results indicate that test case failures and prior resolution of the same issues by other PRs are the most common causes of non integration.
arXiv Detail & Related papers (2026-01-29T22:06:58Z)
RepoGenesis: Benchmarking End-to-End Microservice Generation from Readme to Repository [52.98970048197381]
RepoGenesis is the first multilingual benchmark for repository-level end-to-end web microservice generation.<n>It consists of 106 repositories (60 Python, 46 Java) across 18 domains and 11 frameworks, with 1,258 API endpoints and 2,335 test cases verified.<n>Results reveal that despite high AC (up to 73.91%) and DSR (up to 100%), the best-performing system achieves only 23.67% Pass@1 on Python and 21.45% on Java.
arXiv Detail & Related papers (2026-01-20T13:19:20Z)
Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z)
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability [49.99934595922838]
Reliability is key to realizing the promise of autonomous UI-Agents.<n>We develop OpenApps, a light-weight open-source ecosystem with six apps.<n>We run more than 10,000 independent evaluations to study reliability across seven leading multimodal agents.
arXiv Detail & Related papers (2025-11-25T19:00:22Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
Where LLM Agents Fail and How They can Learn From Failures [62.196870049524364]
Large Language Model (LLM) agents have shown promise in solving complex, multi-step tasks.<n>They amplify vulnerability to cascading failures, where a single root-cause error propagates through subsequent decisions.<n>Current systems lack a framework that can comprehensively understand agent error in a modular and systemic way.<n>We introduce the AgentErrorTaxonomy, a modular classification of failure modes spanning memory, reflection, planning, action, and system-level operations.
arXiv Detail & Related papers (2025-09-29T18:20:27Z)
On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub [6.7302091035327285]
Large language models (LLMs) are increasingly being integrated into software development processes.<n>The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice.<n>We empirically study 567 GitHub pull requests (PRs) generated using Claude Code, an agentic coding tool, across 157 open-source projects.
arXiv Detail & Related papers (2025-09-18T08:48:32Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [60.04362496037186]
We present the first controlled study of developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants.<n>Our results show agents can assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
CI/CD Configuration Practices in Open-Source Android Apps: An Empirical Study [0.1433758865948252]
Continuous Integration and Continuous Delivery (CI/CD) is a well-established practice that automatically builds, tests, packages, and deploys software systems.<n>Mobile apps have distinct characteristics with respect to CI/CD practices, such as testing on various emulators and deploying to app stores.<n>We conduct an empirical study on CI/CD practices in 2,557 Android apps adopting four popular CI/CD services.
arXiv Detail & Related papers (2024-11-09T05:46:43Z)
Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects [1.181206257787103]
This work presents the first empirical analysis of how continuous integration and delivery (CI/CD) configuration evolves for machine learning (ML) software systems.<n>We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories.<n>We developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits.
arXiv Detail & Related papers (2024-03-18T19:14:38Z)
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains. Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration. We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.