Related papers: Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

URL: http://arxiv.org/abs/2310.16263v1
Date: Wed, 25 Oct 2023 00:32:56 GMT
Title: Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation
Authors: Jiexin Wang, Liuwen Cao, Xitong Luo, Zhiping Zhou, Jiayuan Xie, Adam Jatowt, Yi Cai
Abstract summary: Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. This paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective.
Score: 24.668682498171776
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on security. Our experimental results reveal that existing models often overlook security concerns during code generation, leading to the generation of vulnerable code. To address this, we propose effective approaches to mitigate the security vulnerabilities and enhance the overall robustness of code generated by LLMs. Moreover, our study identifies weaknesses in existing models' ability to repair vulnerable code, even when provided with vulnerability information. Additionally, certain vulnerability types pose challenges for the models, hindering their performance in vulnerability classification. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment.

Related papers

Learning to Generate Secure Code via Token-Level Rewards [11.539519023515021]
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities.<n>We propose Vul2Safe, a new secure code generation framework that leverages self-reflection to construct high-confidence repair pairs from real-world vulnerabilities.<n>We also introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security.
arXiv Detail & Related papers (2026-02-26T12:57:27Z)
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z)
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z)
Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation [16.29310628754089]
Large Language Models (LLMs) have become powerful tools for automated code generation.<n>LLMs often overlook critical security practices, which can result in the generation of insecure code.<n>This paper examines their inherent tendencies to produce insecure code, their capability to generate secure code when guided by self-generated vulnerability hints, and their effectiveness in repairing vulnerabilities when provided with different levels of feedback.
arXiv Detail & Related papers (2025-06-28T23:24:33Z)
Large Language Model Unlearning for Source Code [65.42425213605114]
PROD is a novel unlearning approach that enables LLMs to forget undesired code content while preserving their code generation capabilities.<n>Our evaluation demonstrates that PROD achieves superior balance between forget quality and model utility compared to existing unlearning approaches.
arXiv Detail & Related papers (2025-06-20T16:27:59Z)
SafeGenBench: A Benchmark Framework for Security Vulnerability Detection in LLM-Generated Code [7.209766132478914]
We introduce SafeGenBench, a benchmark specifically designed to assess the security of LLM-generated code.<n>The dataset encompasses a wide range of common software development scenarios and vulnerability types.<n>Through the empirical evaluation of state-of-the-art LLMs on SafeGenBench, we reveal notable deficiencies in their ability to produce vulnerability-free code.
arXiv Detail & Related papers (2025-06-06T02:48:02Z)
Give LLMs a Security Course: Securing Retrieval-Augmented Code Generation via Knowledge Injection [5.011290848820237]
Existing Retrieval-Augmented Code Generation (RACG) systems largely overlook security, leading to substantial risks. We propose a security-hardening framework for RACG systems, CodeGuarder, that shifts the paradigm from retrieving only functional code examples to incorporating both functional code and security knowledge. Our framework constructs a security knowledge base from real-world vulnerability databases, including secure code samples and root cause annotations.
arXiv Detail & Related papers (2025-04-23T05:27:27Z)
ProSec: Fortifying Code LLMs with Proactive Security Alignment [14.907702430331803]
Security of code-specific large language models (LLMs) remains under-explored. We propose ProSec, a novel security alignment approach designed to align code LLMs with secure coding practices. Experiments show that models trained with ProSec is 29.2% to 35.5% more secure compared to previous work.
arXiv Detail & Related papers (2024-11-19T22:00:01Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation [17.69409515806874]
We present an exploratory study on whether fine-tuning pre-trained LLMs on datasets of vulnerability-fixing commits can promote secure code generation. We crawled a fine-tuning dataset for secure code generation by collecting code fixes of confirmed vulnerabilities from open-source repositories. Our exploration reveals that fine-tuning LLMs can improve secure code generation by 6.4% in C language and 5.4% in C++ language.
arXiv Detail & Related papers (2024-08-17T02:51:27Z)
ShieldGemma: Generative AI Content Moderation Based on Gemma [49.91147965876678]
ShieldGemma is a suite of safety content moderation models built upon Gemma2. Models provide robust, state-of-the-art predictions of safety risks across key harm types.
arXiv Detail & Related papers (2024-07-31T17:48:14Z)
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval [20.959848710829878]
Large language models (LLMs) have brought significant advancements to code generation and code repair. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. We aim to present a comprehensive study aimed at precisely evaluating and enhancing the security aspects of code LLMs.
arXiv Detail & Related papers (2024-07-02T16:13:21Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward [3.729516018513228]
We introduce a multipurpose code vulnerability analysis system textttSecRepair, powered by a large language model, CodeGen2. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub.
arXiv Detail & Related papers (2024-01-07T02:46:39Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
RoFL: Attestable Robustness for Secure Federated Learning [59.63865074749391]
Federated Learning allows a large number of clients to train a joint model without the need to share their private data. To ensure the confidentiality of the client updates, Federated Learning systems employ secure aggregation. We present RoFL, a secure Federated Learning system that improves robustness against malicious clients.
arXiv Detail & Related papers (2021-07-07T15:42:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.