Related papers: Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization

Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization

URL: http://arxiv.org/abs/2509.09942v2
Date: Sun, 12 Oct 2025 04:04:06 GMT
Title: Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization
Authors: Lei Yu, Jingyuan Zhang, Xin Wang, Jiajia Ma, Li Yang, Fengjun Zhang,
Abstract summary: We propose SmartCoder-R1, a framework for secure and explainable smart contract generation.<n>We train the model to emulate human security analysis.<n>SmartCoder-R1 establishes a new state of the art, achieving top performance across five key metrics.
Score: 18.013438474903314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Smart contracts automate the management of high-value assets, where vulnerabilities can lead to catastrophic financial losses. This challenge is amplified in Large Language Models (LLMs) by two interconnected failures: they operate as unauditable "black boxes" lacking a transparent reasoning process, and consequently, generate code riddled with critical security vulnerabilities. To address both issues, we propose SmartCoder-R1 (based on Qwen2.5-Coder-7B), a novel framework for secure and explainable smart contract generation. It begins with Continual Pre-training (CPT) to specialize the model. We then apply Long Chain-of-Thought Supervised Fine-Tuning (L-CoT SFT) on 7,998 expert-validated reasoning-and-code samples to train the model to emulate human security analysis. Finally, to directly mitigate vulnerabilities, we employ Security-Aware Group Relative Policy Optimization (S-GRPO), a reinforcement learning phase that refines the generation policy by optimizing a weighted reward signal for compilation success, security compliance, and format correctness. Evaluated against 17 baselines on a benchmark of 756 real-world functions, SmartCoder-R1 establishes a new state of the art, achieving top performance across five key metrics: a ComPass of 87.70%, a VulRate of 8.60%, a SafeAval of 80.16%, a FuncRate of 53.84%, and a FullRate of 50.53%. This FullRate marks a 45.79% relative improvement over the strongest baseline, DeepSeek-R1. Crucially, its generated reasoning also excels in human evaluations, achieving high-quality ratings for Functionality (82.7%), Security (85.3%), and Clarity (90.7%).

Related papers

Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away [97.11976870616273]
We propose a lightweight inference-time defense that treats safety recovery as a satising constraint rather than an objective.<n>In our evaluations across six open-source MLRMs and four jailbreak benchmarks, SafeThink reduces attack success rates by 30-60%.
arXiv Detail & Related papers (2026-02-11T18:09:17Z)
Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model [60.60587869092729]
Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment.<n>We propose SecCoderX, an online reinforcement learning framework for functionality-preserving secure code generation.
arXiv Detail & Related papers (2026-02-07T07:42:07Z)
CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z)
Categorical Framework for Quantum-Resistant Zero-Trust AI Security [0.0]
We present a novel integration of post-quantum cryptography (PQC) and zero trust architecture (AZT) to secure AI model.<n>Our framework uniquely models cryptographic access as morphisms and trust policies as functors.<n>We demonstrate its efficacy through a concrete ESP32-based implementation.
arXiv Detail & Related papers (2025-11-25T17:17:24Z)
SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models [66.71948519280669]
Multimodal Large Reasoning Models (MLRMs) demonstrate impressive crossmodal reasoning but often amplify safety risks under adversarial prompts.<n> Existing defenses mainly act at the output level and do not constrain the reasoning process, leaving models to implicit risks.<n>We propose SaFeR-VLM, which integrates four components and supports dynamic and interpretable safety decisions beyond surface-level filtering.
arXiv Detail & Related papers (2025-10-08T10:39:12Z)
A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs [14.754759160027959]
Large Language Models (LLMs) significantly accelerate software development, but their frequent generation of insecure code presents serious risks.<n>We present a comprehensive evaluation of seven parameter-efficient fine-tuning (PEFT) techniques, demonstrating substantial gains in secure code generation without compromising functionality.<n>Our research identifies prompt-tuning as the most effective PEFT method, achieving an 80.86% Overall-Secure-Rate on CodeGen2 16B, a 13.5-point improvement over the 67.28% baseline.
arXiv Detail & Related papers (2025-09-16T04:09:41Z)
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality [41.04710068888387]
PRISM (Principled Reasoning for Integrated Safety in Multimodality) is a system2-like framework that aligns vision-language models (VLMs)<n>Our framework consists of two key components: PRISM-CoT, a dataset that teaches safety-aware chain-of-thought reasoning, and PRISM-DPO, generated via Monte Carlo Tree Search (MCTS)<n> Comprehensive evaluations demonstrate PRISM's effectiveness, achieving remarkably low attack success rates including 0.15% on JailbreakV-28K for Qwen2-VL and 90% improvement over the previous best method on VLBreak for LLaVA-1.5.
arXiv Detail & Related papers (2025-08-26T03:45:19Z)
MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning [0.0]
MalCodeAI is a language-agnostic pipeline for autonomous code security analysis and remediation.<n>It combines code decomposition and semantic reasoning using finetuned Qwen2.5-Coder-3B-Instruct models.<n>MalCodeAI supports red-hat-style exploit tracing, CVSS-based risk scoring, and zero-shot generalization to detect complex, zero-day vulnerabilities.
arXiv Detail & Related papers (2025-07-15T01:25:04Z)
Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z)
A Preference-Driven Methodology for High-Quality Solidity Code Generation [11.139579355590332]
We propose textbfmytitle, a novel framework that extends standard DPO beyond human preferences to incorporate quantifiable blockchain-specific metrics.<n>Our framework introduces a comprehensive evaluation methodology with four complementary metrics: Pass@k (functional correctness), Compile@k (syntactic correctness), Gas@k (gas efficiency), and Secure@k (security assessment)<n>Our framework significantly outperforms existing approaches across all critical dimensions, achieving 66.7% Pass@5, 58.9% Gas@5, and 62.5% Secure@5.
arXiv Detail & Related papers (2025-06-03T15:45:31Z)
CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale [46.76144797837242]
Large language model (LLM) agents are becoming increasingly skilled at handling cybersecurity tasks autonomously.<n>Existing benchmarks fall short, often failing to capture real-world scenarios or being limited in scope.<n>We introduce CyberGym, a large-scale and high-quality cybersecurity evaluation framework featuring 1,507 real-world vulnerabilities.
arXiv Detail & Related papers (2025-06-03T07:35:14Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
Enhancing Smart Contract Vulnerability Detection in DApps Leveraging Fine-Tuned LLM [0.7018579932647147]
Decentralized applications (DApps) face significant security risks due to vulnerabilities in smart contracts.<n>This paper proposes a novel approach leveraging fine-tuned Large Language Models (LLMs) to enhance smart contract vulnerability detection.
arXiv Detail & Related papers (2025-04-07T12:32:14Z)
SmartLLM: Smart Contract Auditing using Custom Generative AI [0.0]
This paper introduces SmartLLM, a novel approach leveraging fine-tuned LLaMA 3.1 models with Retrieval-Augmented Generation (RAG)<n>By integrating domain-specific knowledge from ERC standards, SmartLLM achieves superior performance compared to static analysis tools like Mythril and Slither.<n> Experimental results demonstrate a perfect recall of 100% and an accuracy score of 70%, highlighting the model's robustness in identifying vulnerabilities.
arXiv Detail & Related papers (2025-02-17T06:22:05Z)
Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models [60.38983114420845]
We propose dual risk minimization (DRM) to better preserve the core features of downstream tasks.<n>DRM balances expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks.
arXiv Detail & Related papers (2024-11-29T15:01:25Z)
G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing. FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity. We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.