Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on
Chain of Thought
- URL: http://arxiv.org/abs/2402.12023v1
- Date: Mon, 19 Feb 2024 10:33:29 GMT
- Title: Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on
Chain of Thought
- Authors: Yuying Du and Xueyan Tang
- Abstract summary: This study explores the potential of enhancing smart contract security audits using the GPT-4 model.
We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities.
We found GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%.
- Score: 8.04987973069845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Smart contracts, as a key component of blockchain technology, play a crucial
role in ensuring the automation of transactions and adherence to protocol
rules. However, smart contracts are susceptible to security vulnerabilities,
which, if exploited, can lead to significant asset losses. This study explores
the potential of enhancing smart contract security audits using the GPT-4
model. We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark
vulnerability library, containing 732 vulnerabilities, and compared it with
five other vulnerability detection tools to evaluate GPT-4's ability to
identify seven common types of vulnerabilities. Moreover, we assessed GPT-4's
performance in code parsing and vulnerability capture by simulating a
professional auditor's auditing process using CoT(Chain of Thought) prompts
based on the audit reports of eight groups of smart contracts. We also
evaluated GPT-4's ability to write Solidity Proof of Concepts (PoCs). Through
experimentation, we found that GPT-4 performed poorly in detecting smart
contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of
37.8%, and an F1-score of 41.1%, indicating a tendency to miss vulnerabilities
during detection. Meanwhile, it demonstrated good contract code parsing
capabilities, with an average comprehensive score of 6.5, capable of
identifying the background information and functional relationships of smart
contracts; in 60% of the cases, it could write usable PoCs, suggesting GPT-4
has significant potential application in PoC writing. These experimental
results indicate that GPT-4 lacks the ability to detect smart contract
vulnerabilities effectively, but its performance in contract code parsing and
PoC writing demonstrates its significant potential as an auxiliary tool in
enhancing the efficiency and effectiveness of smart contract security audits.
Related papers
- Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection [0.0]
Decentralized Finance (DeFi) has been accompanied by substantial financial losses due to smart contract vulnerabilities.
With attacks becoming more frequent, the necessity and demand for auditing services has escalated.
This study builds upon existing frameworks by integrating Retrieval-Augmented Generation (RAG) with large language models (LLMs)
arXiv Detail & Related papers (2024-07-20T10:46:42Z) - A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter [15.28361088402754]
This paper introduces a novel context-driven prompting technique for smart contract co-auditing.
Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments.
Our method demonstrated a detection rate of 96% for vulnerable functions, outperforming the native prompting approach, which detected only 53%.
arXiv Detail & Related papers (2024-06-26T05:14:35Z) - Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks [65.84623493488633]
This paper conducts a rigorous evaluation of GPT-4o against jailbreak attacks.
The newly introduced audio modality opens up new attack vectors for jailbreak attacks on GPT-4o.
Existing black-box multimodal jailbreak attack methods are largely ineffective against GPT-4o and GPT-4V.
arXiv Detail & Related papers (2024-06-10T14:18:56Z) - Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression [109.23761449840222]
This study conducts the first, thorough evaluation of leading Large Language Models (LLMs)
We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously.
arXiv Detail & Related papers (2024-03-18T01:38:19Z) - Efficiently Detecting Reentrancy Vulnerabilities in Complex Smart Contracts [35.26195628798847]
Existing vulnerability detection tools perform poorly in terms of efficiency and successful detection rates for vulnerabilities in complex contracts.
SliSE provides a robust and efficient method for detection of Reentrancy vulnerabilities for complex contracts.
arXiv Detail & Related papers (2024-03-17T16:08:30Z) - A Survey and Comparative Analysis of Security Properties of CAN Authentication Protocols [92.81385447582882]
The Controller Area Network (CAN) bus leaves in-vehicle communications inherently non-secure.
This paper reviews and compares the 15 most prominent authentication protocols for the CAN bus.
We evaluate protocols based on essential operational criteria that contribute to ease of implementation.
arXiv Detail & Related papers (2024-01-19T14:52:04Z) - Vulnerability Scanners for Ethereum Smart Contracts: A Large-Scale Study [44.25093111430751]
In 2023 alone, such vulnerabilities led to substantial financial losses exceeding a billion of US dollars.
Various tools have been developed to detect and mitigate vulnerabilities in smart contracts.
This study investigates the gap between the effectiveness of existing security scanners and the vulnerabilities that still persist in practice.
arXiv Detail & Related papers (2023-12-27T11:26:26Z) - When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We? [34.61179425241671]
We present an empirical study to investigate the performance of ChatGPT in identifying smart contract vulnerabilities.
ChatGPT achieves a high recall rate, but its precision in pinpointing smart contract vulnerabilities is limited.
Our research provides insights into the strengths and weaknesses of employing large language models, specifically ChatGPT, for the detection of smart contract vulnerabilities.
arXiv Detail & Related papers (2023-09-11T15:02:44Z) - GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis [26.081673382969615]
We propose GPTScan, the first tool combining GPT with static analysis for smart contract logic vulnerability detection.
By breaking down each logic vulnerability type into scenarios and properties, GPTScan matches candidate vulnerabilities with GPT.
It effectively detects ground-truth logic vulnerabilities with a recall of over 70%, including 9 new vulnerabilities missed by human auditors.
arXiv Detail & Related papers (2023-08-07T05:48:53Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep
Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable.
We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts.
We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.