SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models
- URL: http://arxiv.org/abs/2502.07644v2
- Date: Wed, 12 Feb 2025 05:18:48 GMT
- Title: SymGPT: Auditing Smart Contracts via Combining Symbolic Execution with Large Language Models
- Authors: Shihao Xia, Mengting He, Shuai Shao, Tingting Yu, Yiying Zhang, Linhai Song,
- Abstract summary: This paper introduces SymGPT, a tool that combines the natural language understanding of large language models (LLMs) with the formal guarantees of symbolic execution.
We conduct an empirical study of 132 ERC rules from three widely used ERC standards, examining their content, security implications, and natural language descriptions.
We then synthesize constraints from the formalized rules to represent scenarios where violations may occur and use symbolic execution to detect them.
Our evaluation shows that SymGPT identifies 5,783 ERC rule violations in 4,000 real-world contracts, including 1,375 violations with clear attack paths for stealing financial assets.
- Score: 8.697080709545352
- License:
- Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each having a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to manually audit each single contract, use expert-developed program-analysis tools, or use large language models (LLMs), all of which are far from effective in identifying ERC rule violations. This paper introduces SymGPT, a tool that combines the natural language understanding of large language models (LLMs) with the formal guarantees of symbolic execution to automatically verify smart contracts' compliance with ERC rules. To develop SymGPT, we conduct an empirical study of 132 ERC rules from three widely used ERC standards, examining their content, security implications, and natural language descriptions. Based on this study, we design SymGPT by first instructing an LLM to translate ERC rules into a defined EBNF grammar. We then synthesize constraints from the formalized rules to represent scenarios where violations may occur and use symbolic execution to detect them. Our evaluation shows that SymGPT identifies 5,783 ERC rule violations in 4,000 real-world contracts, including 1,375 violations with clear attack paths for stealing financial assets, demonstrating its effectiveness. Furthermore, SymGPT outperforms six automated techniques and a security-expert auditing service, underscoring its superiority over current smart contract analysis methods.
Related papers
- SmartLLM: Smart Contract Auditing using Custom Generative AI [0.0]
This paper introduces SmartLLM, a novel approach leveraging fine-tuned LLaMA 3.1 models with Retrieval-Augmented Generation (RAG)
By integrating domain-specific knowledge from ERC standards, SmartLLM achieves superior performance compared to static analysis tools like Mythril and Slither.
Experimental results demonstrate a perfect recall of 100% and an accuracy score of 70%, highlighting the model's robustness in identifying vulnerabilities.
arXiv Detail & Related papers (2025-02-17T06:22:05Z) - SmartLLMSentry: A Comprehensive LLM Based Smart Contract Vulnerability Detection Framework [0.0]
This paper introduces SmartLLMSentry, a novel framework that leverages large language models (LLMs) to advance smart contract vulnerability detection.
We created a specialized dataset of five randomly selected vulnerabilities for model training and evaluation.
Our results show an exact match accuracy of 91.1% with sufficient data, although GPT-4 demonstrated reduced performance compared to GPT-3 in rule generation.
arXiv Detail & Related papers (2024-11-28T16:02:01Z) - SC-Bench: A Large-Scale Dataset for Smart Contract Auditing [5.787866021952808]
We present SC-Bench, the first dataset for automated smart-contract auditing research.
SC-Bench consists of 5,377 real-world smart contracts and 15,975 violations of standards on Ehereum called ERCs.
We evaluate SC-Bench using GPT-4 by prompting it with both the contracts and ERC rules.
Our results show that without the oracle, GPT-4 can only detect 0.9% violations, and with the oracle, it detects 22.9% violations.
arXiv Detail & Related papers (2024-10-08T16:23:50Z) - SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations.
First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics.
Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z) - All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts [24.881450403784786]
Vulnerabilities in the process of address verification can lead to great security issues.
We design and implement AVVERIFIER, a lightweight taint analyzer based on static EVM opcode simulation.
After a large-scale evaluation of over 5 million smart contracts, we have identified 812 vulnerable smart contracts that were undisclosed by our community.
arXiv Detail & Related papers (2024-05-31T01:02:07Z) - Learnable Item Tokenization for Generative Recommendation [78.30417863309061]
We propose LETTER (a LEarnable Tokenizer for generaTivE Recommendation), which integrates hierarchical semantics, collaborative signals, and code assignment diversity.
LETTER incorporates Residual Quantized VAE for semantic regularization, a contrastive alignment loss for collaborative regularization, and a diversity loss to mitigate code assignment bias.
arXiv Detail & Related papers (2024-05-12T15:49:38Z) - AuditGPT: Auditing Smart Contracts with ChatGPT [8.697080709545352]
AuditGPT is a tool to automatically and comprehensively verify ERC rules against smart contracts.
It pinpoints 418 ERC rule violations and only reports 18 false positives, showcasing its effectiveness and accuracy.
arXiv Detail & Related papers (2024-04-05T07:19:13Z) - Intention Analysis Makes LLMs A Good Jailbreak Defender [79.4014719271075]
We present a simple yet highly effective defense strategy, i.e., Intention Analysis ($mathbbIA$)
$mathbbIA$ works by triggering LLMs' inherent self-correct and improve ability through a two-stage process.
Experiments on varying jailbreak benchmarks show that $mathbbIA$ could consistently and significantly reduce the harmfulness in responses.
arXiv Detail & Related papers (2024-01-12T13:15:05Z) - Vulnerability Scanners for Ethereum Smart Contracts: A Large-Scale Study [44.25093111430751]
In 2023 alone, such vulnerabilities led to substantial financial losses exceeding a billion of US dollars.
Various tools have been developed to detect and mitigate vulnerabilities in smart contracts.
This study investigates the gap between the effectiveness of existing security scanners and the vulnerabilities that still persist in practice.
arXiv Detail & Related papers (2023-12-27T11:26:26Z) - Blockchain Large Language Models [65.7726590159576]
This paper presents a dynamic, real-time approach to detecting anomalous blockchain transactions.
The proposed tool, BlockGPT, generates tracing representations of blockchain activity and trains from scratch a large language model to act as a real-time Intrusion Detection System.
arXiv Detail & Related papers (2023-04-25T11:56:18Z) - ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep
Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable.
We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts.
We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.