A Framework for Measuring the Quality of Infrastructure-as-Code Scripts
- URL: http://arxiv.org/abs/2502.03127v1
- Date: Wed, 05 Feb 2025 12:36:19 GMT
- Title: A Framework for Measuring the Quality of Infrastructure-as-Code Scripts
- Authors: Pandu Ranga Reddy Konala, Vimal Kumar, David Bainbridge, Junaid Haseeb,
- Abstract summary: Infrastructure as Code (IaC) has become integral to modern software development.<n>Rapid proliferation of IaC scripts has highlighted the need for better code quality assessment methods.<n>This paper proposes a new IaC code quality framework specifically showcased for repositories as a foundation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Infrastructure as Code (IaC) has become integral to modern software development, enabling automated and consistent configuration of computing environments. The rapid proliferation of IaC scripts has highlighted the need for better code quality assessment methods. This paper proposes a new IaC code quality framework specifically showcased for Ansible repositories as a foundation. By analyzing a comprehensive dataset of repositories from Ansible Galaxy, we applied our framework to evaluate code quality across multiple attributes. The analysis of our code quality metrics applied to Ansible Galaxy repositories reveal trends over time indicating improvements in areas such as metadata and error handling, while highlighting declines in others such as sophistication and automation. The framework offers practitioners a systematic tool for assessing and enhancing IaC scripts, fostering standardization and facilitating continuous improvement. It also provides a standardized foundation for further work into IaC code quality.
Related papers
- Automated Validation of LLM-based Evaluators for Software Engineering Artifacts [0.7548538278943616]
REFINE (Ranking Evaluators for FIne grained Nuanced Evaluation) is an automated framework for benchmarking large language models (LLMs)<n> REFINE applies novel generation techniques to automatically synthesize artifacts with progressively reduced quality.<n>It quantifies each candidate evaluator configuration by measuring how closely its rankings align with expected ordering.
arXiv Detail & Related papers (2025-08-04T18:52:01Z) - MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks [56.34018316319873]
We propose MERA Code, a benchmark for evaluating code for the latest code generation LLMs in Russian.<n>This benchmark includes 11 evaluation tasks that span 8 programming languages.<n>We evaluate open LLMs and frontier API models, analyzing their limitations in terms of practical coding tasks in non-English languages.
arXiv Detail & Related papers (2025-07-16T14:31:33Z) - Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements [0.36832029288386137]
This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) into software development.<n>A static code analysis framework detects issues such as bugs, vulnerabilities, and code smells within a large-scale software project.<n>Retrieval-augmented generation (RAG) is implemented to enhance the relevance and precision of the revisions.
arXiv Detail & Related papers (2025-06-12T03:39:25Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.
We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.
Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z) - FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation [26.14778133391999]
FEA-Bench is a benchmark designed to assess the ability of large language models to perform incremental development within code repositories.
We collect pull requests from 83 GitHub repositories and use rule-based and intent-based filtering to construct task instances focused on new feature development.
arXiv Detail & Related papers (2025-03-09T16:11:57Z) - On Iterative Evaluation and Enhancement of Code Quality Using GPT-4o [1.5960340244043023]
This paper introduces CodeQUEST, a novel framework leveraging Large Language Models (LLMs) to iteratively evaluate and enhance code quality.
The framework is divided into two main components: an Evaluator that assesses code quality across ten dimensions, providing both quantitative scores and qualitative summaries.
Our study demonstrates that CodeQUEST can effectively and robustly evaluate code quality, with its assessments aligning with established code quality metrics.
arXiv Detail & Related papers (2025-02-11T09:27:00Z) - Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights [0.0]
This paper introduces a novel scoring mechanism called the SBC score.
It is based on a reverse generation technique that leverages the natural language generation capabilities of Large Language Models.
Unlike direct code analysis, our approach reconstructs system requirements from AI-generated code and compares them with the original specifications.
arXiv Detail & Related papers (2025-02-11T01:12:11Z) - Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation [37.25839260805938]
Skeleton-Guided-Translation is a framework for repository-level Java to C# code translation with fine-grained quality evaluation.<n>We present TRANSREPO-BENCH, a benchmark of high quality open-source Java repositories and their corresponding C# skeletons.
arXiv Detail & Related papers (2025-01-27T13:44:51Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.
We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.
We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z) - On the Impacts of Contexts on Repository-Level Code Generation [5.641402231731082]
We present textbfmethodnamews, a novel benchmark designed to evaluate repository-level code generation.
We focus on three key aspects: executability, functional correctness through comprehensive test case generation, and accurate utilization of cross-file contexts.
arXiv Detail & Related papers (2024-06-17T10:45:22Z) - Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution [48.86322922826514]
This paper defines a new task of Knowledge-aware Language Model Attribution (KaLMA)
First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios.
Second, we propose a new Conscious Incompetence" setting considering the incomplete knowledge repository.
Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment.
arXiv Detail & Related papers (2023-10-09T11:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.