Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation
- URL: http://arxiv.org/abs/2511.01316v1
- Date: Mon, 03 Nov 2025 08:01:09 GMT
- Title: Exploringand Unleashing the Power of Large Language Models in CI/CD Configuration Translation
- Authors: Chong Wang, Chen Zhang, Jiajun Wu, Wunan Guo, Jianfeng Qu, Yewen Tian, Yang Liu,
- Abstract summary: Continuous Integration (CI) is a cornerstone of modern collaborative software development.<n>With the advent of large language models (LLMs), recent advances in software engineering highlight their potential for CI configuration translation.<n>We present a study on LLM-based CI configuration translation, focusing on the migration from Travis CI to GitHub Actions.
- Score: 22.867758531615248
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous Integration (CI) is a cornerstone of modern collaborative software development, and numerous CI platforms are available. Differences in maintenance overhead, reliability, and integration depth with code-hosting platforms make migration between CI platforms a common practice. A central step in migration is translating CI configurations, which is challenging due to the intrinsic complexity of CI configurations and the need to understand semantic differences and relationships across CI platforms. With the advent of large language models (LLMs), recent advances in software engineering highlight their potential for CI configuration translation. In this paper, we present a study on LLM-based CI configuration translation, focusing on the migration from Travis CI to GitHub Actions. First, using 811 migration records, we quantify the effort involved and find that developers read an average of 38 lines of Travis configuration and write 58 lines of GitHub Actions configuration, with nearly half of the migrations requiring multiple commits. We further analyze translations produced by each of the four LLMs and identify 1,121 issues grouped into four categories: logic inconsistencies (38%), platform discrepancies (32%), environment errors (25%), and syntax errors (5%). Finally, we evaluate three enhancement strategies and show that combining guideline-based prompting with iterative refinement achieves the best performance, reaching a Build Success Rate of 75.5%-nearly a threefold improvement over GPT-4o with a basic prompt.
Related papers
- Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages [61.18573330164572]
System prompts provide a lightweight yet powerful mechanism for conditioning large language models (LLMs) at inference time.<n>This paper presents a comprehensive study of how different system prompts steer models toward accurate and robust cross-lingual behavior.
arXiv Detail & Related papers (2025-12-02T14:54:54Z) - VisCoder2: Building Multi-Language Visualization Coding Agents [63.63232038173407]
We introduce three complementary resources for advancing visualization coding agents.<n>VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models.
arXiv Detail & Related papers (2025-10-24T18:03:57Z) - Can LLMs Write CI? A Study on Automatic Generation of GitHub Actions Configurations [0.0]
Continuous Integration services, such as GitHub Actions, require developers to write YAML-based configurations.<n>Despite the increasing use of Large Language Models (LLMs) to automate software engineering tasks, their ability to generate CI configurations remains underexplored.<n>This paper presents a preliminary study evaluating six LLMs for generating GitHub Actions configurations from natural language descriptions.
arXiv Detail & Related papers (2025-07-23T03:18:04Z) - NL in the Middle: Code Translation with LLMs and Intermediate Representations [56.77064674776534]
Large language models (LLMs) produce buggy code translations.<n>One promising avenue to improve translation accuracy is through intermediate representations.<n>We investigate whether LLM-based code translation can benefit from intermediate representations.
arXiv Detail & Related papers (2025-07-11T14:29:21Z) - Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees [0.03994567502796063]
We introduce GG (Guaranteed Guess), an ISA-centric transpilation pipeline that combines the translation power of pre-trained large language models with the rigor of established software testing constructs.<n>Our method generates candidate translations using an LLM from one ISA to another, and embeds such translations within a software-testing framework to build quantifiable confidence in the translation.<n>We evaluate our GG approach over two diverse datasets, enforce high code coverage (>98%) across unit tests, and achieve functional/semantic correctness of 99% on HumanEval programs and 49% on BringupBench programs.
arXiv Detail & Related papers (2025-06-17T15:06:54Z) - SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.<n>Existing direct preference learning algorithms are originally designed for the single-turn chat task.<n>We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning [1.9854146581797698]
BLAZE is an approach that employs dynamic chunking and hard example learning.<n>It fine-tunes a GPT-based model using challenging bug cases to enhance cross-project and cross-language bug localization.<n>BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR)
arXiv Detail & Related papers (2024-07-24T20:44:36Z) - Example-Based Automatic Migration of Continuous Integration Systems [2.2836654317217326]
Continuous Integration (CI) is a widely adopted practice for faster code change integration and testing.
Developers often migrate between CI systems in pursuit of features like matrix building or better logging.
This migration is effort intensive and error-prone owing to limited knowledge of the new CI system and its syntax.
We propose a novel approach for CI system's automatic migration: CIMig.
arXiv Detail & Related papers (2024-07-02T20:19:21Z) - Detecting Continuous Integration Skip : A Reinforcement Learning-based Approach [0.4297070083645049]
Continuous Integration (CI) practices facilitate the seamless integration of code changes by employing automated building and testing processes.
Some frameworks, such as Travis CI and GitHub Actions have significantly contributed to simplifying and enhancing the CI process.
Developers continue to encounter difficulties in accurately flagging commits as either suitable for CI execution or as candidates for skipping.
arXiv Detail & Related papers (2024-05-15T18:48:57Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.