A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter
- URL: http://arxiv.org/abs/2406.18075v1
- Date: Wed, 26 Jun 2024 05:14:35 GMT
- Title: A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter
- Authors: Mohamed Salah Bouafif, Chen Zheng, Ilham Ahmed Qasse, Ed Zulkoski, Mohammad Hamdaqa, Foutse Khomh,
- Abstract summary: This paper introduces a novel context-driven prompting technique for smart contract co-auditing.
Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments.
Our method demonstrated a detection rate of 96% for vulnerable functions, outperforming the native prompting approach, which detected only 53%.
- Score: 15.28361088402754
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The surge in the adoption of smart contracts necessitates rigorous auditing to ensure their security and reliability. Manual auditing, although comprehensive, is time-consuming and heavily reliant on the auditor's expertise. With the rise of Large Language Models (LLMs), there is growing interest in leveraging them to assist auditors in the auditing process (co-auditing). However, the effectiveness of LLMs in smart contract co-auditing is contingent upon the design of the input prompts, especially in terms of context description and code length. This paper introduces a novel context-driven prompting technique for smart contract co-auditing. Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments based on code inter-dependencies, assessment scoping to enhance context description based on the target assessment goal, thereby limiting the search space, and reporting scoping to force a specific format for the generated response. Through empirical evaluations on publicly available vulnerable contracts, our method demonstrated a detection rate of 96\% for vulnerable functions, outperforming the native prompting approach, which detected only 53\%. To assess the reliability of our prompting approach, manual analysis of the results was conducted by expert auditors from our partner, Quantstamp, a world-leading smart contract auditing company. The experts' analysis indicates that, in unlabeled datasets, our proposed approach enhances the proficiency of the GPT-4 code interpreter in detecting vulnerabilities.
Related papers
- Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Retrieval Augmented Generation Integrated Large Language Models in Smart Contract Vulnerability Detection [0.0]
Decentralized Finance (DeFi) has been accompanied by substantial financial losses due to smart contract vulnerabilities.
With attacks becoming more frequent, the necessity and demand for auditing services has escalated.
This study builds upon existing frameworks by integrating Retrieval-Augmented Generation (RAG) with large language models (LLMs)
arXiv Detail & Related papers (2024-07-20T10:46:42Z) - SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors [64.9938658716425]
Existing evaluations of large language models' (LLMs) ability to recognize and reject unsafe user requests face three limitations.
First, existing methods often use coarse-grained of unsafe topics, and are over-representing some fine-grained topics.
Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
Third, existing evaluations rely on large LLMs for evaluation, which can be expensive.
arXiv Detail & Related papers (2024-06-20T17:56:07Z) - Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.
This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences.
We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z) - AuditGPT: Auditing Smart Contracts with ChatGPT [8.697080709545352]
AuditGPT is a tool to automatically and comprehensively verify ERC rules against smart contracts.
It pinpoints 418 ERC rule violations and only reports 18 false positives, showcasing its effectiveness and accuracy.
arXiv Detail & Related papers (2024-04-05T07:19:13Z) - TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications.
This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z) - Evaluation of ChatGPT's Smart Contract Auditing Capabilities Based on
Chain of Thought [8.04987973069845]
This study explores the potential of enhancing smart contract security audits using the GPT-4 model.
We utilized a dataset of 35 smart contracts from the SolidiFI-benchmark vulnerability library, containing 732 vulnerabilities.
We found GPT-4 performed poorly in detecting smart contract vulnerabilities, with a high Precision of 96.6%, but a low Recall of 37.8%, and an F1-score of 41.1%.
arXiv Detail & Related papers (2024-02-19T10:33:29Z) - TrustFed: A Reliable Federated Learning Framework with Malicious-Attack
Resistance [8.924352407824566]
Federated learning (FL) enables collaborative learning among multiple clients while ensuring individual data privacy.
In this paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework to enhance the reliability and security of the learning process.
Our simulation results demonstrate that HiAudit-FL can effectively identify and handle potential malicious users accurately, with small system overhead.
arXiv Detail & Related papers (2023-12-06T13:56:45Z) - Igniting Language Intelligence: The Hitchhiker's Guide From
Chain-of-Thought Reasoning to Language Agents [80.5213198675411]
Large language models (LLMs) have dramatically enhanced the field of language intelligence.
LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer.
Recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents.
arXiv Detail & Related papers (2023-11-20T14:30:55Z) - Large Language Model-Powered Smart Contract Vulnerability Detection: New
Perspectives [8.524720028421447]
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4.
generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives.
We propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination.
arXiv Detail & Related papers (2023-10-02T12:37:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.