LLM-Powered Code Vulnerability Repair with Reinforcement Learning and
Semantic Reward
- URL: http://arxiv.org/abs/2401.03374v2
- Date: Thu, 22 Feb 2024 00:29:37 GMT
- Title: LLM-Powered Code Vulnerability Repair with Reinforcement Learning and
Semantic Reward
- Authors: Nafis Tanveer Islam, Joseph Khoury, Andrew Seong, Mohammad Bahrami
Karkevandi, Gonzalo De La Torre Parra, Elias Bou-Harb, Peyman Najafirad
- Abstract summary: We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2.
Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs.
We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
- Score: 3.729516018513228
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In software development, the predominant emphasis on functionality often
supersedes security concerns, a trend gaining momentum with AI-driven
automation tools like GitHub Copilot. These tools significantly improve
developers' efficiency in functional code development. Nevertheless, it remains
a notable concern that such tools are also responsible for creating insecure
code, predominantly because of pre-training on publicly available repositories
with vulnerable code. Moreover, developers are called the "weakest link in the
chain" since they have very minimal knowledge of code security. Although
existing solutions provide a reasonable solution to vulnerable code, they must
adequately describe and educate the developers on code security to ensure that
the security issues are not repeated. Therefore we introduce a multipurpose
code vulnerability analysis system \texttt{SecRepair}, powered by a large
language model, CodeGen2 assisting the developer in identifying and generating
fixed code along with a complete description of the vulnerability with a code
comment. Our innovative methodology uses a reinforcement learning paradigm to
generate code comments augmented by a semantic reward mechanism. Inspired by
how humans fix code issues, we propose an instruction-based dataset suitable
for vulnerability analysis with LLMs. We further identify zero-day and N-day
vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings
underscore that incorporating reinforcement learning coupled with semantic
reward augments our model's performance, thereby fortifying its capacity to
address code vulnerabilities with improved efficacy.
Related papers
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation.
Recent studies highlight that many LLM-generated code contains serious security vulnerabilities.
We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z) - Unintentional Security Flaws in Code: Automated Defense via Root Cause Analysis [2.899501205987888]
We developed an automated vulnerability root cause (RC) toolkit called T5-RCGCN.
It combines T5 language model embeddings with a graph convolutional network (GCN) for vulnerability classification and localization.
We tested T5-RCGCN with 56 junior developers across three datasets, showing a 28.9% improvement in code security compared to previous methods.
arXiv Detail & Related papers (2024-08-30T18:26:59Z) - Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair.
However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities.
We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - Causative Insights into Open Source Software Security using Large
Language Code Embeddings and Semantic Vulnerability Graph [3.623199159688412]
Open Source Software (OSS) vulnerabilities can cause unauthorized access, data breaches, network disruptions, and privacy violations.
Recent deep-learning techniques have shown great promise in identifying and localizing vulnerabilities in source code.
Our study shows a 24% improvement in code repair capabilities compared to previous methods.
arXiv Detail & Related papers (2024-01-13T10:33:22Z) - Enhancing Large Language Models for Secure Code Generation: A
Dataset-driven Study on Vulnerability Mitigation [24.668682498171776]
Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers.
However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities.
This paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective.
arXiv Detail & Related papers (2023-10-25T00:32:56Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Multi-context Attention Fusion Neural Network for Software Vulnerability
Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently.
The model builds an accurate understanding of code semantics with a lot less learnable parameters.
The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.