Related papers: Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?

Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?

URL: http://arxiv.org/abs/2112.02125v1
Date: Fri, 3 Dec 2021 19:15:02 GMT
Title: Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs?
Authors: Hammond Pearce and Benjamin Tan and Baleegh Ahmad and Ramesh Karri and Brendan Dolan-Gavitt
Abstract summary: We examine the use of large language models (LLMs) for code repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. Experiments show that LLMs could collectively repair 100% of our synthetically generated and hand-crafted scenarios.
Score: 8.285068188878578
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human developers can produce code with cybersecurity weaknesses. Can emerging 'smart' code completion tools help repair those weaknesses? In this work, we examine the use of large language models (LLMs) for code (such as OpenAI's Codex and AI21's Jurassic J-1) for zero-shot vulnerability repair. We investigate challenges in the design of prompts that coax LLMs into generating repaired versions of insecure code. This is difficult due to the numerous ways to phrase key information -- both semantically and syntactically -- with natural languages. By performing a large scale study of four commercially available, black-box, "off-the-shelf" LLMs, as well as a locally-trained model, on a mix of synthetic, hand-crafted, and real-world security bug scenarios, our experiments show that LLMs could collectively repair 100% of our synthetically generated and hand-crafted scenarios, as well as 58% of vulnerabilities in a selection of historical bugs in real-world open-source projects.

Related papers

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation [16.29310628754089]
Large Language Models (LLMs) have become powerful tools for automated code generation.<n>LLMs often overlook critical security practices, which can result in the generation of insecure code.<n>This paper examines their inherent tendencies to produce insecure code, their capability to generate secure code when guided by self-generated vulnerability hints, and their effectiveness in repairing vulnerabilities when provided with different levels of feedback.
arXiv Detail & Related papers (2025-06-28T23:24:33Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
LLMs in Coding and their Impact on the Commercial Software Engineering Landscape [0.0]
Large-language-model coding tools are now mainstream in software engineering.<n>But as these same tools move human effort up the development stack, they present fresh dangers.<n>We argue that firms must tag and review every AI-generated line of code.
arXiv Detail & Related papers (2025-06-19T23:43:54Z)
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis [3.892345568697058]
Large Language Models (LLMs) are one of the most promising developments in the field of artificial intelligence. Developers routinely ask LLMs to generate code snippets, increasing productivity but also introducing ownership, privacy, correctness, and security issues. Previous work highlighted how code generated by commercial LLMs is often not safe, containing vulnerabilities, bugs, and code smells.
arXiv Detail & Related papers (2024-12-19T13:34:14Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [17.69409515806874]
We present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We crawled a fine-tuning dataset for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories. Our exploration reveals that fine-tuning LLMs can improve secure code generation by 6.4% in C language and 5.4% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
Software Vulnerability and Functionality Assessment using LLMs [0.8057006406834466]
We investigate whether Large Language Models (LLMs) can aid with code reviews. Our investigation focuses on two tasks that we argue are fundamental to good reviews.
arXiv Detail & Related papers (2024-03-13T11:29:13Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z)
Weak-to-Strong Jailbreaking on Large Language Models [96.50953637783581]
Large language models (LLMs) are vulnerable to jailbreak attacks. Existing jailbreaking methods are computationally costly. We propose the weak-to-strong jailbreaking attack.
arXiv Detail & Related papers (2024-01-30T18:48:37Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z)
Can LLMs Patch Security Issues? [1.3299507495084417]
Large Language Models (LLMs) have shown impressive proficiency in code generation. LLMs share a weakness with their human counterparts: producing code that inadvertently has security vulnerabilities. We propose Feedback-Driven Security Patching (FDSP), where LLMs automatically refine generated, vulnerable code.
arXiv Detail & Related papers (2023-11-13T08:54:37Z)
A LLM Assisted Exploitation of AI-Guardian [57.572998144258705]
We evaluate the robustness of AI-Guardian, a recent defense to adversarial examples published at IEEE S&P 2023. We write none of the code to attack this model, and instead prompt GPT-4 to implement all attack algorithms following our instructions and guidance. This process was surprisingly effective and efficient, with the language model at times producing code from ambiguous instructions faster than the author of this paper could have done.
arXiv Detail & Related papers (2023-07-20T17:33:25Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.