Automated DevOps Pipeline Generation for Code Repositories using Large
Language Models
- URL: http://arxiv.org/abs/2312.13225v1
- Date: Wed, 20 Dec 2023 17:47:52 GMT
- Title: Automated DevOps Pipeline Generation for Code Repositories using Large
Language Models
- Authors: Deep Mehta, Kartik Rawool, Subodh Gujar, Bowen Xu
- Abstract summary: The research scrutinizes the proficiency of GPT 3.5 and GPT 4 in generating GitHub, while assessing the influence of various prompt elements in constructing the most efficient pipeline.
Results indicate substantial advancements in GPT 4.
The research introduces a GitHub App built on Probot, empowering users to automate workflow generation within GitHub ecosystem.
- Score: 5.011328607647701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automating software development processes through the orchestration of GitHub
Action workflows has revolutionized the efficiency and agility of software
delivery pipelines. This paper presents a detailed investigation into the use
of Large Language Models (LLMs) specifically, GPT 3.5 and GPT 4 to generate and
evaluate GitHub Action workflows for DevOps tasks. Our methodology involves
data collection from public GitHub repositories, prompt engineering for LLM
utilization, and evaluation metrics encompassing exact match scores, BLEU
scores, and a novel DevOps Aware score. The research scrutinizes the
proficiency of GPT 3.5 and GPT 4 in generating GitHub workflows, while
assessing the influence of various prompt elements in constructing the most
efficient pipeline. Results indicate substantial advancements in GPT 4,
particularly in DevOps awareness and syntax correctness. The research
introduces a GitHub App built on Probot, empowering users to automate workflow
generation within GitHub ecosystem. This study contributes insights into the
evolving landscape of AI-driven automation in DevOps practices.
Related papers
- WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration.
It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories.
LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z) - AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains.
We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search.
Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z) - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance [45.53834452021771]
GitHub Actions (GA) is an orchestration platform that streamlines the automatic execution of engineering tasks.
Human intervention is necessary to correct defects, update dependencies, or existing workflow files.
arXiv Detail & Related papers (2024-09-04T01:33:16Z) - Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning [12.254055731378045]
GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain a pipeline.
To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually.
We propose Gavel, a practical solution to increasing the visibility of actions in GitHub.
arXiv Detail & Related papers (2024-07-24T02:27:36Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - On the effectiveness of Large Language Models for GitHub Workflows [9.82254417875841]
Large Language Models (LLMs) have demonstrated their effectiveness in various software development tasks.
We perform the first comprehensive study to understand the effectiveness of LLMs on five workflow-related tasks with different levels of prompts.
Our evaluation of three state-of-art LLMs and their fine-tuned variants revealed various interesting findings on the current effectiveness and drawbacks of LLMs.
arXiv Detail & Related papers (2024-03-19T05:14:12Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.