Related papers: Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of Thought

Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of Thought

URL: http://arxiv.org/abs/2402.12023v1
Date: Mon, 19 Feb 2024 10:33:29 GMT
Title: Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on Chain of Thought
Authors: Yuying Du and Xueyan Tang
Abstract summary: This study explores the potential of enhancing smart contract security audits using the GPT-4 model. We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities. We found GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%.
Score: 8.04987973069845
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Smart contracts, as a key component of blockchain technology, play a crucial role in ensuring the automation of transactions and adherence to protocol rules. However, smart contracts are susceptible to security vulnerabilities, which, if exploited, can lead to significant asset losses. This study explores the potential of enhancing smart contract security audits using the GPT-4 model. We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities, and compared it with five other vulnerability detection tools to evaluate GPT-4's ability to identify seven common types of vulnerabilities. Moreover, we assessed GPT-4's performance in code parsing and vulnerability capture by simulating a professional auditor's auditing process using CoT(Chain of Thought) prompts based on the audit reports of eight groups of smart contracts. We also evaluated GPT-4's ability to write Solidity Proof of Concepts (PoCs). Through experimentation, we found that GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%, indicating a tendency to miss vulnerabilities during detection. Meanwhile, it demonstrated good contract code parsing capabilities, with an average comprehensive score of 6.5, capable of identifying the background information and functional relationships of smart contracts; in 60% of the cases, it could write usable PoCs, suggesting GPT-4 has significant potential application in PoC writing. These experimental results indicate that GPT-4 lacks the ability to detect smart contract vulnerabilities effectively, but its performance in contract code parsing and PoC writing demonstrates its significant potential as an auxiliary tool in enhancing the efficiency and effectiveness of smart contract security audits.

Related papers

SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode [0.7018579932647147]
This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis.
arXiv Detail & Related papers (2025-04-07T12:30:12Z)
Benchmarking Reasoning Robustness in Large Language Models [76.79744000300363]
We find significant performance degradation on novel or incomplete data. These findings highlight the reliance on recall over rigorous logical inference. This paper introduces a novel benchmark, termed as Math-RoB, that exploits hallucinations triggered by missing information to expose reasoning gaps.
arXiv Detail & Related papers (2025-03-06T15:36:06Z)
Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection [0.0]
We present SimilarGPT, a vulnerability identification tool for smart contract. The main concept of SimilarGPT is to measure the similarity between the code under inspection and the secure code from third-party libraries. We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection.
arXiv Detail & Related papers (2024-12-24T07:15:48Z)
Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection [0.0]
Decentralized Finance (DeFi) has been accompanied by substantial financial losses due to smart contract vulnerabilities. With attacks becoming more frequent, the necessity and demand for auditing services has escalated. This study builds upon existing frameworks by integrating Retrieval-Augmented Generation (RAG) with large language models (LLMs)
arXiv Detail & Related papers (2024-07-20T10:46:42Z)
A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter [15.28361088402754]
This paper introduces a novel context-driven prompting technique for smart contract co-auditing. Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments. Our method demonstrated a detection rate of 96% for vulnerable functions, outperforming the native prompting approach, which detected only 53%.
arXiv Detail & Related papers (2024-06-26T05:14:35Z)
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks [65.84623493488633]
This paper conducts a rigorous evaluation of GPT-4o against jailbreak attacks. The newly introduced audio modality opens up new attack vectors for jailbreak attacks on GPT-4o. Existing black-box multimodal jailbreak attack methods are largely ineffective against GPT-4o and GPT-4V.
arXiv Detail & Related papers (2024-06-10T14:18:56Z)
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression [109.23761449840222]
This study conducts the first, thorough evaluation of leading Large Language Models (LLMs) We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously.
arXiv Detail & Related papers (2024-03-18T01:38:19Z)
Efficiently Detecting Reentrancy Vulnerabilities in Complex Smart Contracts [35.26195628798847]
Existing vulnerability detection tools perform poorly in terms of efficiency and successful detection rates for vulnerabilities in complex contracts. SliSE provides a robust and efficient method for detection of Reentrancy vulnerabilities for complex contracts.
arXiv Detail & Related papers (2024-03-17T16:08:30Z)
A Survey and Comparative Analysis of Security Properties of CAN Authentication Protocols [92.81385447582882]
The Controller Area Network (CAN) bus leaves in-vehicle communications inherently non-secure. This paper reviews and compares the 15 most prominent authentication protocols for the CAN bus. We evaluate protocols based on essential operational criteria that contribute to ease of implementation.
arXiv Detail & Related papers (2024-01-19T14:52:04Z)
Vulnerability Scanners for Ethereum Smart Contracts: A Large-Scale Study [44.25093111430751]
In 2023 alone, such vulnerabilities led to substantial financial losses exceeding a billion of US dollars. Various tools have been developed to detect and mitigate vulnerabilities in smart contracts. This study investigates the gap between the effectiveness of existing security scanners and the vulnerabilities that still persist in practice.
arXiv Detail & Related papers (2023-12-27T11:26:26Z)
When ChatGPT Meets Smart Contract Vulnerability Detection: How Far Are We? [34.61179425241671]
We present an empirical study to investigate the performance of ChatGPT in identifying smart contract vulnerabilities. ChatGPT achieves a high recall rate, but its precision in pinpointing smart contract vulnerabilities is limited. Our research provides insights into the strengths and weaknesses of employing large language models, specifically ChatGPT, for the detection of smart contract vulnerabilities.
arXiv Detail & Related papers (2023-09-11T15:02:44Z)
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis [26.081673382969615]
We propose GPTScan, the first tool combining GPT with static analysis for smart contract logic vulnerability detection. By breaking down each logic vulnerability type into scenarios and properties, GPTScan matches candidate vulnerabilities with GPT. It effectively detects ground-truth logic vulnerabilities with a recall of over 70%, including 9 new vulnerabilities missed by human auditors.
arXiv Detail & Related papers (2023-08-07T05:48:53Z)
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5. We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z)
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable. We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts. We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.