Related papers: Can Developers rely on LLMs for Secure IaC Development?

Can Developers rely on LLMs for Secure IaC Development?

URL: http://arxiv.org/abs/2602.03648v1
Date: Tue, 03 Feb 2026 15:32:36 GMT
Title: Can Developers rely on LLMs for Secure IaC Development?
Authors: Ehsan Firouzi, Shardul Bhatt, Mohammad Ghafari,
Abstract summary: GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development were investigated.<n>For security smell detection, models detected at least 71% of security smells when prompted to analyze code from a security perspective.<n>For secure code generation, we prompted LLMs with 89 vulnerable synthetic scenarios and observed that only 7% of the generated scripts were secure.
Score: 2.2940141855172036
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigated the capabilities of GPT-4o and Gemini 2.0 Flash for secure Infrastructure as Code (IaC) development. For security smell detection, on the Stack Overflow dataset, which primarily contains small, simplified code snippets, the models detected at least 71% of security smells when prompted to analyze code from a security perspective (general prompt). With a guided prompt (adding clear, step-by-step instructions), this increased to 78%.In GitHub repositories, which contain complete, real-world project scripts, a general prompt was less effective, leaving more than half of the smells undetected. However, with the guided prompt, the models uncovered at least 67% of the smells. For secure code generation, we prompted LLMs with 89 vulnerable synthetic scenarios and observed that only 7% of the generated scripts were secure. Adding an explicit instruction to generate secure code increased GPT secure output rate to 17%, while Gemini changed little (8%). These results highlight the need for further research to improve LLMs' capabilities in assisting developers with secure IaC development.

Related papers

Trace: Securing Smart Contract Repository Against Access Control Vulnerability [58.02691083789239]
GitHub hosts numerous smart contract repositories containing source code, documentation, and configuration files.<n>Third-party developers often reference, reuse, or fork code from these repositories during custom development.<n>Existing tools for detecting smart contract vulnerabilities are limited in their ability to handle complex repositories.
arXiv Detail & Related papers (2025-10-22T05:18:28Z)
Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe [16.177098761970683]
We present Secure-Instruct, a framework that automatically synthesizes high-quality vulnerable and secure code examples.<n>We find that Secure-Instruct improves not only the security but also the functional correctness of the generated code.
arXiv Detail & Related papers (2025-10-08T16:24:09Z)
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [58.65256663334316]
We present SafeAgentBench -- the first benchmark for safety-aware task planning of embodied LLM agents in interactive simulation environments.<n>SafeAgentBench includes: (1) an executable, diverse, and high-quality dataset of 750 tasks, rigorously curated to cover 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 9 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives.
arXiv Detail & Related papers (2024-12-17T18:55:58Z)
RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation. RedCode-Exec provides challenging prompts that could lead to risky code execution. RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z)
From Solitary Directives to Interactive Encouragement! LLM Secure Code Generation by Natural Language Prompting [24.27542373791212]
This work introduces SecCode, a framework that leverages an innovative interactive encouragement prompting (EP) technique for secure code generation with textitonly NL prompts. SecCode functions through three stages: 1) Code Generation using NL Prompts; 2) Code Vulnerability Detection and Fixing, utilising our proposed encouragement prompting; 3) Vulnerability Cross-Checking and Code Security Refinement.
arXiv Detail & Related papers (2024-10-18T09:32:08Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
How Well Do Large Language Models Serve as End-to-End Secure Code Agents for Python? [42.119319820752324]
We studied GPT-3.5 and GPT-4's capability to identify and repair vulnerabilities in the code generated by four popular LLMs.<n>By manually or automatically reviewing 4,900 pieces of code, our study reveals that large language models lack awareness of scenario-relevant security risks.<n>To address the limitation of a single round of repair, we developed a lightweight tool that prompts LLMs to construct safer source code.
arXiv Detail & Related papers (2024-08-20T02:42:29Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [16.000227726163967]
We study whether fine-tuning pre-trained Large Language Models on datasets of vulnerability-fixing commits can promote secure code generation.<n>We crawl a fine-tuning dataset (14,622 C/C++ files) for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories.<n>We observe maximum improvements in security of 6.4% in C language and 5.0% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection [17.948513691133037]
We introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures.
arXiv Detail & Related papers (2024-06-10T22:10:05Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
PAL: Proxy-Guided Black-Box Attack on Large Language Models [55.57987172146731]
Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated capabilities to generate harmful content when manipulated. We introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting. Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art.
arXiv Detail & Related papers (2024-02-15T02:54:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.