Related papers: Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe

Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe

URL: http://arxiv.org/abs/2510.07189v1
Date: Wed, 08 Oct 2025 16:24:09 GMT
Title: Prompt, Synthesize, Fine-Tune: A Secure Code Generation Recipe
Authors: Junjie Li, Fazle Rabbi, Bo Yang, Song Wang, Jinqiu Yang,
Abstract summary: We present Secure-Instruct, a framework that automatically synthesizes high-quality vulnerable and secure code examples.<n>We find that Secure-Instruct improves not only the security but also the functional correctness of the generated code.
Score: 16.177098761970683
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although Large Language Models (LLMs) show promising solutions to automated code generation, they often produce insecure code that threatens software security. Current approaches (e.g., SafeCoder) to improve secure code generation suffer from limited and imbalanced datasets, reducing their effectiveness and generalizability. In this work, we present Secure-Instruct, a novel framework that automatically synthesizes high-quality vulnerable and secure code examples, generates fine-tuning instructions, and instruction-tunes LLMs to align task description and secure code generation abilities. We evaluate Secure-Instruct on four representative LLMs using two benchmarks: our own CWEBench and the existing CWEval. CWEBench comprises 93 scenarios on 44 CWEs, all without overlap with Secure-Instruct's synthetic instruction-tuning dataset, while CWEval covers 31 CWEs with 119 manually verified security-critical tasks. We find that Secure-Instruct improves not only the security but also the functional correctness of the generated code. On CWEBench, Secure-Instruct substantially improves secure code generation, giving a 14.3% average increase in secure ratio over the pretrained models and outperforms SafeCoder by 7.6%. On CWEval, Secure-Instruct achieves a 14% increase for CodeLlama-7B and 5.8% for Mistral-7B in Func-Sec@1 over pretrained models, and surpasses SafeCoder by 15.8% and 6.8% respectively.

Related papers

Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z)
CVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security Vulnerability [50.57373283154859]
We present CVE-Factory, the first multiagent framework to achieve expert-level quality in automatically transforming vulnerability tasks.<n>It is also evaluated on the latest realistic vulnerabilities and achieves a 66.2% verified success.<n>We synthesize over 1,000 executable training environments, the first large-scale scaling of agentic tasks in code security.
arXiv Detail & Related papers (2026-02-03T02:27:16Z)
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space [91.99501941169831]
GuardSpace is a guardrail framework for preserving safety alignment throughout fine-tuning.<n>For Llama-2-7B-Chat fine-tuned on GSM8K, GuardSpace outperforms the state-of-the-art method AsFT.
arXiv Detail & Related papers (2025-10-16T04:57:53Z)
SmartCoder-R1: Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization [18.013438474903314]
We propose SmartCoder-R1, a framework for secure and explainable smart contract generation.<n>We train the model to emulate human security analysis.<n>SmartCoder-R1 establishes a new state of the art, achieving top performance across five key metrics.
arXiv Detail & Related papers (2025-09-12T03:14:50Z)
SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law [91.33824439029533]
We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety.<n>It is developed by our proposed SafeLadder framework, which incorporates large-scale, progressive, safety-oriented reinforcement learning post-training.<n>We further develop SafeWork-R1-InternVL3-78B, SafeWork-R1-DeepSeek-70B, and SafeWork-R1-Qwen2.5VL-7B.
arXiv Detail & Related papers (2025-07-24T16:49:19Z)
SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows [8.546083810528502]
Large language models (LLMs) have seen widespread success in code generation tasks for different scenarios.<n>Despite producing functional code, current LLMs do not prioritize security and may generate code with exploitable vulnerabilities.<n>We propose techniques for generating code that is more likely to be secure and introduce SCGAgent.
arXiv Detail & Related papers (2025-06-08T23:08:08Z)
Safety Pretraining: Toward the Next Generation of Safe AI [68.99129474671282]
We present a data-centric pretraining framework that builds safety into the model from the start.<n>Our framework consists of four key steps: Safety Filtering, Safety Rephrasing, Native Refusal and Harmfulness-Tag annotated pretraining.<n>Our safety-pretrained models reduce attack success rates from 38.8% to 8.4% on standard LLM safety benchmarks with no performance on general degradation tasks.
arXiv Detail & Related papers (2025-04-23T17:58:08Z)
ProSec: Fortifying Code LLMs with Proactive Security Alignment [14.907702430331803]
Existing methods collect security-focused datasets from real-world vulnerabilities for instruction tuning.<n>We propose ProSec, a novel proactive security alignment approach designed to align code LLMs with secure coding practices.
arXiv Detail & Related papers (2024-11-19T22:00:01Z)
SeCodePLT: A Unified Platform for Evaluating the Security of Code GenAI [58.29510889419971]
Existing benchmarks for evaluating the security risks and capabilities of code-generating large language models (LLMs) face several key limitations.<n>We introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations.<n>Applying this framework to Python, C/C++, and Java, we build SeCodePLT, a dataset of more than 5.9k samples spanning 44 CWE-based risk categories and three security capabilities.
arXiv Detail & Related papers (2024-10-14T21:17:22Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.