Related papers: FLAMES: Fine-tuning LLMs to Synthesize Invariants for Smart Contract Security

FLAMES: Fine-tuning LLMs to Synthesize Invariants for Smart Contract Security

URL: http://arxiv.org/abs/2510.21401v1
Date: Fri, 24 Oct 2025 12:44:08 GMT
Title: FLAMES: Fine-tuning LLMs to Synthesize Invariants for Smart Contract Security
Authors: Mojtaba Eshghie, Gabriele Morello, Matteo Lauretano, Alexandre Bartel, Martin Monperrus,
Abstract summary: FLAMES is an automated approach that synthesizes runtime guards as Solidity "require" statements to harden smart contracts against exploits.<n>FLAMES employs domain-adapted large language models trained through fill-in-the-middle supervised fine-tuning on real-world invariants extracted from 514,506 verified contracts.
Score: 41.836337574143535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Smart contract vulnerabilities cost billions of dollars annually, yet existing automated analysis tools fail to generate deployable defenses. We present FLAMES, a novel automated approach that synthesizes executable runtime guards as Solidity "require" statements to harden smart contracts against exploits. Unlike prior work that relies on vulnerability labels, symbolic analysis, or natural language specifications, FLAMES employs domain-adapted large language models trained through fill-in-the-middle supervised fine-tuning on real-world invariants extracted from 514,506 verified contracts. Our extensive evaluation across three dimensions demonstrates FLAMES's effectiveness: (1) Compilation: FLAMES achieves 96.7% compilability for synthesized invariant (2) Semantic Quality: on a curated test set of 5,000 challenging invariants, FLAMES produces exact or semantically equivalent matches to ground truth in 44.5% of cases; (3) Exploit Mitigation: FLAMES prevents 22 out of 108 real exploits (20.4%) while preserving contract functionality, and (4) FLAMES successfully blocks the real-world APEMAGA incident by synthesizing a pre-condition that mitigates the attack. FLAMES establishes that domain-adapted LLMs can automatically generate production-ready security defenses for smart contracts without requiring vulnerability detection, formal specifications, or human intervention. We release our code, model weights, datasets, and evaluation infrastructure to enable reproducible research in this critical domain.

Related papers

RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
Prompt Engineering vs. Fine-Tuning for LLM-Based Vulnerability Detection in Solana and Algorand Smart Contracts [1.0255673932966183]
This paper investigates the capability of large language models (LLMs) to detect vulnerabilities in smart contracts.<n>We focus on the smart contract ecosystem of Solana and Algorand.<n>Our findings suggest that LLM-based approaches are viable for static vulnerability detection in smart contracts.
arXiv Detail & Related papers (2025-11-14T12:50:36Z)
ParaVul: A Parallel Large Language Model and Retrieval-Augmented Framework for Smart Contract Vulnerability Detection [43.41293570032631]
ParaVul is a retrieval-augmented framework to improve the reliability and accuracy of smart contract vulnerability detection.<n>We develop Sparse Low-Rank Adaptation (SLoRA) for LLM fine-tuning.<n>We construct a vulnerability contract dataset and develop a hybrid Retrieval-Augmented Generation (RAG) system.
arXiv Detail & Related papers (2025-10-20T03:23:41Z)
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models [50.21378052667732]
We conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics.<n>We propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach.
arXiv Detail & Related papers (2025-09-29T05:17:10Z)
Prompt to Pwn: Automated Exploit Generation for Smart Contracts [7.808685501356819]
We present textscReX, a framework integrating LLM-based exploit synthesis with the Foundry testing suite.<n>We evaluate five state-of-the-art LLMs on both synthetic benchmarks and real-world smart contracts affected by known high-impact exploits.<n>Our results show that modern LLMs can reliably generate functional PoC exploits for diverse vulnerability types, with success rates reaching up to 92%.
arXiv Detail & Related papers (2025-08-02T13:52:15Z)
LLAMA: Multi-Feedback Smart Contract Fuzzing Framework with LLM-Guided Seed Generation [56.84049855266145]
We propose a Multi-feedback Smart Contract Fuzzing framework (LLAMA) that integrates evolutionary mutation strategies, and hybrid testing techniques.<n>LLAMA achieves 91% instruction coverage and 90% branch coverage, while detecting 132 out of 148 known vulnerabilities.<n>These results highlight LLAMA's effectiveness, adaptability, and practicality in real-world smart contract security testing scenarios.
arXiv Detail & Related papers (2025-07-16T09:46:58Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
AegisLLM: Scaling Agentic Systems for Self-Reflective Defense in LLM Security [74.22452069013289]
AegisLLM is a cooperative multi-agent defense against adversarial attacks and information leakage.<n>We show that scaling agentic reasoning system at test-time substantially enhances robustness without compromising model utility.<n> Comprehensive evaluations across key threat scenarios, including unlearning and jailbreaking, demonstrate the effectiveness of AegisLLM.
arXiv Detail & Related papers (2025-04-29T17:36:05Z)
MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models [16.16186929130931]
Smart contract vulnerabilities pose significant security risks to blockchain systems.<n>We propose a smart contract vulnerability detection framework based on mixture-of-experts tuning (MOE-Tuning) of large language models.<n> Experiments show that MOS significantly outperforms existing methods with average improvements of 6.32% in F1 score and 4.80% in accuracy.
arXiv Detail & Related papers (2025-04-16T16:33:53Z)
Enhancing Smart Contract Vulnerability Detection in DApps Leveraging Fine-Tuned LLM [0.7018579932647147]
Decentralized applications (DApps) face significant security risks due to vulnerabilities in smart contracts.<n>This paper proposes a novel approach leveraging fine-tuned Large Language Models (LLMs) to enhance smart contract vulnerability detection.
arXiv Detail & Related papers (2025-04-07T12:32:14Z)
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [50.980446687774645]
We introduce ADV-LLM, an iterative self-tuning process that crafts adversarial LLMs with enhanced jailbreak ability.<n>Our framework significantly reduces the computational cost of generating adversarial suffixes while achieving nearly 100% ASR on various open-source LLMs.<n>It exhibits strong attack transferability to closed-source models, achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4, despite being optimized solely on Llama3.
arXiv Detail & Related papers (2024-10-24T06:36:12Z)
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning [20.463200377413255]
We introduce a unified evaluation framework that assesses large language models' vulnerability reasoning capabilities.<n>We test six representative LLMs for 147 ground-truth vulnerabilities and 147 non-vulnerable cases in 3,528 controlled scenarios.<n>Our findings reveal the varying impacts of knowledge enhancement, context supplementation, and prompt schemes.
arXiv Detail & Related papers (2024-01-29T14:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.