Related papers: Automated DevOps Pipeline Generation for Code Repositories using Large Language Models

Automated DevOps Pipeline Generation for Code Repositories using Large Language Models

URL: http://arxiv.org/abs/2312.13225v1
Date: Wed, 20 Dec 2023 17:47:52 GMT
Title: Automated DevOps Pipeline Generation for Code Repositories using Large Language Models
Authors: Deep Mehta, Kartik Rawool, Subodh Gujar, Bowen Xu
Abstract summary: The research scrutinizes the proficiency of GPT 3.5 and GPT 4 in generating GitHub, while assessing the influence of various prompt elements in constructing the most efficient pipeline. Results indicate substantial advancements in GPT 4. The research introduces a GitHub App built on Probot, empowering users to automate workflow generation within GitHub ecosystem.
Score: 5.011328607647701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automating software development processes through the orchestration of GitHub Action workflows has revolutionized the efficiency and agility of software delivery pipelines. This paper presents a detailed investigation into the use of Large Language Models (LLMs) specifically, GPT 3.5 and GPT 4 to generate and evaluate GitHub Action workflows for DevOps tasks. Our methodology involves data collection from public GitHub repositories, prompt engineering for LLM utilization, and evaluation metrics encompassing exact match scores, BLEU scores, and a novel DevOps Aware score. The research scrutinizes the proficiency of GPT 3.5 and GPT 4 in generating GitHub workflows, while assessing the influence of various prompt elements in constructing the most efficient pipeline. Results indicate substantial advancements in GPT 4, particularly in DevOps awareness and syntax correctness. The research introduces a GitHub App built on Probot, empowering users to automate workflow generation within GitHub ecosystem. This study contributes insights into the evolving landscape of AI-driven automation in DevOps practices.

Related papers

Iterative Trajectory Exploration for Multimodal Agents [69.32855772335624]
We propose an online self-exploration method for multimodal agents, namely SPORT. SPORT operates through four iterative components: task synthesis, step sampling, step verification, and preference tuning. Evaluation in the GTA and GAIA benchmarks show that the SPORT Agent achieves 6.41% and 3.64% improvements.
arXiv Detail & Related papers (2025-04-30T12:01:27Z)
UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance [65.01483640267885]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. We introduce UnitCoder, a systematic pipeline leveraging model-generated unit tests to guide and validate the code generation process. Our work presents a scalable approach that leverages model-generated unit tests to guide the synthesis of high-quality code data from pre-training corpora.
arXiv Detail & Related papers (2025-02-17T05:37:02Z)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration. It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z)
AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
The Hidden Costs of Automation: An Empirical Study on GitHub Actions Workflow Maintenance [45.53834452021771]
GitHub Actions (GA) is an orchestration platform that streamlines the automatic execution of engineering tasks. Human intervention is necessary to correct defects, update dependencies, or existing workflow files.
arXiv Detail & Related papers (2024-09-04T01:33:16Z)
Automatic Categorization of GitHub Actions with Transformers and Few-shot Learning [12.254055731378045]
GitHub Actions (GHA) have been conceived to provide developers with a practical tool to create and maintain a pipeline. To expose actions to search engines, GitHub allows developers to assign them to one or more categories manually. We propose Gavel, a practical solution to increasing the visibility of actions in GitHub.
arXiv Detail & Related papers (2024-07-24T02:27:36Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
On the effectiveness of Large Language Models for GitHub Workflows [9.82254417875841]
Large Language Models (LLMs) have demonstrated their effectiveness in various software development tasks. We perform the first comprehensive study to understand the effectiveness of LLMs on five workflow-related tasks with different levels of prompts. Our evaluation of three state-of-art LLMs and their fine-tuned variants revealed various interesting findings on the current effectiveness and drawbacks of LLMs.
arXiv Detail & Related papers (2024-03-19T05:14:12Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code [7.760653867600283]
We evaluate GPT-4 using three prompt engineering strategies -- basic prompting, in-context learning, and task-specific prompting. We compare it against 17 fine-tuned models across three code-related tasks: code summarization, generation, and translation.
arXiv Detail & Related papers (2023-10-11T00:21:00Z)
How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.