Related papers: Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects

Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects

URL: http://arxiv.org/abs/2403.12199v4
Date: Sun, 23 Feb 2025 17:37:19 GMT
Title: Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects
Authors: Dhia Elhaq Rzig, Alaa Houerbi, Rahul Ghanshyam Chavan, Foyzul Hassan,
Abstract summary: This work presents the first empirical analysis of how continuous integration and delivery (CI/CD) configuration evolves for machine learning (ML) software systems.<n>We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories.<n>We developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits.
Score: 1.181206257787103
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The growing popularity of machine learning (ML) and the integration of ML components with other software artifacts has led to the use of continuous integration and delivery (CI/CD) tools, such as Travis CI, GitHub Actions, etc. that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Several works discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is very limited knowledge of how CI/CD configuration and services change in ML projects. To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories in ML projects and devised a taxonomy of 14 co-changes in CI/CD and ML components. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis consisting of using deprecated settings and reliance on a generic build language. Finally, our developer's expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations.

Related papers

An ML-based Approach to Predicting Software Change Dependencies: Insights from an Empirical Study on OpenStack [0.41232474244672235]
In modern software systems, dependencies often span multiple components across teams, creating challenges for development and deployment.<n>We propose a semi-automated approach that leverages two ML models.<n>Our proposed models demonstrate strong performance, achieving average AUC scores of 79.33% and 91.89%, and Brier scores of 0.11 and 0.014, respectively.
arXiv Detail & Related papers (2025-08-07T05:16:29Z)
CIgrate: Automating CI Service Migration with Large Language Models [2.3020018305241337]
This report presents a study in which we aim to assess whether CI migration can be improved using Large Language Models (LLMs)<n>LLMs have demonstrated strong capabilities in code generation and transformation tasks.<n>We propose CIgrate, an LLM-based framework for automatically migrating CI configurations.
arXiv Detail & Related papers (2025-07-27T19:51:37Z)
From First Use to Final Commit: Studying the Evolution of Multi-CI Service Adoption [0.0]
We analyze the historical CI adoption of 18,924 Java projects hosted on GitHub between January 2008 and December 2024.<n>Our analysis shows that the use of multiple CI services within the same project is a recurring pattern observed in nearly one in five projects.
arXiv Detail & Related papers (2025-07-27T01:32:22Z)
Centrality Change Proneness: an Early Indicator of Microservice Architectural Degradation [48.55946052680251]
The study of temporal networks has emerged as a way to describe and analyze evolving networks.<n>Previous research has explored how software metrics such as size, complexity, and quality are related to microservice centrality.<n>This study investigates whether temporal centrality metrics can provide insight into the early detection of architectural degradation.
arXiv Detail & Related papers (2025-06-09T12:22:12Z)
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
Specifications: The missing link to making the development of LLM systems an engineering discipline [65.10077876035417]
We discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute. We outline several future directions for research to enable the development of modular and reliable LLM-based systems.
arXiv Detail & Related papers (2024-11-25T07:48:31Z)
CI/CD Configuration Practices in Open-Source Android Apps: An Empirical Study [0.1433758865948252]
We conduct an empirical study on Continuous Integration and Continuous Delivery practices in 2,564 Android apps. We observe a lack of commonality and standards across projects and services, leading to complex YML configurations. Our study emphasizes the necessity for automation and AI-powered tools to improve CI/CD processes for mobile applications.
arXiv Detail & Related papers (2024-11-09T05:46:43Z)
Adoption and Adaptation of CI/CD Practices in Very Small Software Development Entities: A Systematic Literature Review [0.0]
This study presents a systematic review on the adoption of Continuous Integration and Continuous Delivery (CI/CD) practices in Very Small Entities (VSEs) in software development. The research analyzes 13 selected studies to identify common CI/CD practices, characterize the specific limitations of VSEs, and explore strategies for adapting these practices to small-scale environments.
arXiv Detail & Related papers (2024-09-29T04:43:15Z)
Open-CD: A Comprehensive Toolbox for Change Detection [59.79011759027916]
Open-CD is a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules.
arXiv Detail & Related papers (2024-07-22T01:04:16Z)
Standardizing Structural Causal Models [80.21199731817698]
We propose internally-standardized structural causal models (iSCMs) for benchmarking algorithms. By construction, iSCMs are not $operatornameVar$-sortable, and as we show experimentally, not $operatornameR2$-sortable either for commonly-used graph families.
arXiv Detail & Related papers (2024-06-17T14:52:21Z)
Detecting Continuous Integration Skip : A Reinforcement Learning-based Approach [0.4297070083645049]
Continuous Integration (CI) practices facilitate the seamless integration of code changes by employing automated building and testing processes. Some frameworks, such as Travis CI and GitHub Actions have significantly contributed to simplifying and enhancing the CI process. Developers continue to encounter difficulties in accurately flagging commits as either suitable for CI execution or as candidates for skipping.
arXiv Detail & Related papers (2024-05-15T18:48:57Z)
DevBench: A Comprehensive Benchmark for Software Development [72.24266814625685]
DevBench is a benchmark that evaluates large language models (LLMs) across various stages of the software development lifecycle. Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
Toward Automatically Completing GitHub Workflows [16.302521048148748]
We present GH-WCOM (GitHub COMpletion), a Transformer-based approach supporting developers in writing a specific type of CI/CD pipelines, namely GitHub. Our empirical study shows that GH-WCOM provides up to 34.23% correct predictions.
arXiv Detail & Related papers (2023-08-31T14:53:00Z)
Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
OpenICL: An Open-Source Framework for In-context Learning [48.75452105457122]
We introduce OpenICL, an open-source toolkit for In-context Learning (ICL) and large language model evaluation. OpenICL is research-friendly with a highly flexible architecture that users can easily combine different components to suit their needs. The effectiveness of OpenICL has been validated on a wide range of NLP tasks, including classification, QA, machine translation, and semantic parsing.
arXiv Detail & Related papers (2023-03-06T06:20:25Z)
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge) The CK concept is to decompose research projects into reusable components that encapsulate research artifacts. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.