Related papers: SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows

SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows

URL: http://arxiv.org/abs/2506.07313v1
Date: Sun, 08 Jun 2025 23:08:08 GMT
Title: SCGAgent: Recreating the Benefits of Reasoning Models for Secure Code Generation with Agentic Workflows
Authors: Rebecca Saul, Hao Wang, Koushik Sen, David Wagner,
Abstract summary: Large language models (LLMs) have seen widespread success in code generation tasks for different scenarios.<n>Despite producing functional code, current LLMs do not prioritize security and may generate code with exploitable vulnerabilities.<n>We propose techniques for generating code that is more likely to be secure and introduce SCGAgent.
Score: 8.546083810528502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have seen widespread success in code generation tasks for different scenarios, both everyday and professional. However current LLMs, despite producing functional code, do not prioritize security and may generate code with exploitable vulnerabilities. In this work, we propose techniques for generating code that is more likely to be secure and introduce SCGAgent, a proactive secure coding agent that implements our techniques. We use security coding guidelines that articulate safe programming practices, combined with LLM-generated unit tests to preserve functional correctness. In our evaluation, we find that SCGAgent is able to preserve nearly 98% of the functionality of the base Sonnet-3.7 LLM while achieving an approximately 25% improvement in security. Moreover, SCGAgent is able to match or best the performance of sophisticated reasoning LLMs using a non-reasoning model and an agentic workflow.

Related papers

Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
ProSec: Fortifying Code LLMs with Proactive Security Alignment [14.907702430331803]
Existing methods collect security-focused datasets from real-world vulnerabilities for instruction tuning.<n>We propose ProSec, a novel proactive security alignment approach designed to align code LLMs with secure coding practices.
arXiv Detail & Related papers (2024-11-19T22:00:01Z)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.<n>We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.<n>We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z)
PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs) [4.2913589403278225]
Large language models (LLMs) are used to generate high-quality source code. LLMs often introduce security vulnerabilities due to training on insecure open-source data. This paper introduces PromSec, an algorithm for prom optimization for secure and functioning code generation.
arXiv Detail & Related papers (2024-09-19T12:14:10Z)
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing [6.334110674473677]
Existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code. We propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration. Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation.
arXiv Detail & Related papers (2024-09-16T21:15:56Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
ShadowCode: Towards (Automatic) External Prompt Injection Attack against Code LLMs [56.46702494338318]
This paper introduces a new attack paradigm: (automatic) external prompt injection against code-oriented large language models.<n>We propose ShadowCode, a simple yet effective method that automatically generates induced perturbations based on code simulation.<n>We evaluate our method across 13 distinct malicious objectives, generating 31 threat cases spanning three popular programming languages.
arXiv Detail & Related papers (2024-07-12T10:59:32Z)
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
We propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests.<n>Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution.<n>We show that GuardAgent effectively moderates the violation actions for different types of agents on two benchmarks with over 98% and 83% guardrail accuracies, respectively.
arXiv Detail & Related papers (2024-06-13T14:49:26Z)
TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety. The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
Code Security Vulnerability Repair Using Reinforcement Learning with Large Language Models [1.5457286059556397]
We propose a reinforcement learning-based method for security hardening and strengthening of generated code from Large Language Models (LLMs) In this work, we propose a reinforcement learning-based method for program-specific repair with the combination of semantic and syntactic reward mechanisms that focus heavily on adding security and functional measures in the code, respectively.
arXiv Detail & Related papers (2024-01-13T10:19:26Z)
SALLM: Security Assessment of Generated Code [0.5137309756089941]
This paper describes SALLM, a framework to benchmark Large Language Models' abilities to generate secure code systematically. The framework has three major components: a novel dataset of security-centric Python prompts, assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
arXiv Detail & Related papers (2023-11-01T22:46:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.