Related papers: Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection

Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection

URL: http://arxiv.org/abs/2412.18225v1
Date: Tue, 24 Dec 2024 07:15:48 GMT
Title: Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection
Authors: Jango Zhang,
Abstract summary: We present SimilarGPT, a vulnerability identification tool for smart contract.<n>The main concept of SimilarGPT is to measure the similarity between the code under inspection and the secure code from third-party libraries.<n>We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid growth of blockchain technology, smart contracts are now crucial to Decentralized Finance (DeFi) applications. Effective vulnerability detection is vital for securing these contracts against hackers and enhancing the accuracy and efficiency of security audits. In this paper, we present SimilarGPT, a unique vulnerability identification tool for smart contract, which combines Generative Pretrained Transformer (GPT) models with Code-based similarity checking methods. The main concept of the SimilarGPT tool is to measure the similarity between the code under inspection and the secure code from third-party libraries. To identify potential vulnerabilities, we connect the semantic understanding capability of large language models (LLMs) with Code-based similarity checking techniques. We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection. Through analysis of code reuse patterns in smart contracts, we compile and process extensive third-party library code to establish a comprehensive reference codebase. Then, we utilize LLM to conduct an indepth analysis of similar codes to identify and explain potential vulnerabilities in the codes. The experimental findings indicate that SimilarGPT excels in detecting vulnerabilities in smart contracts, particularly in missed detections and minimizing false positives.

Related papers

Malicious Code Detection in Smart Contracts via Opcode Vectorization [0.8225825738565354]
Security problems of smart contracts become increasingly prominent. The existence of malicious codes may lead to the loss of user assets and system crash. In this paper, a simple study is carried out on malicious code detection of intelligent contracts based on machine learning.
arXiv Detail & Related papers (2025-04-17T07:51:48Z)
SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode [0.7018579932647147]
This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis.
arXiv Detail & Related papers (2025-04-07T12:30:12Z)
Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem. These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem. We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z)
Impact of Code Transformation on Detection of Smart Contract Vulnerabilities [0.0]
This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100%.
arXiv Detail & Related papers (2024-10-29T03:08:25Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models [1.081463830315253]
We empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub. We introduce Soley, an automated method for detecting logic vulnerabilities in smart contracts. We examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios.
arXiv Detail & Related papers (2024-06-24T00:15:18Z)
Blockchain Smart Contract Threat Detection Technology Based on Symbolic Execution [0.0]
Reentrancy vulnerability, which is hidden and complex, poses a great threat to smart contracts. In this paper, we propose a smart contract threat detection technology based on symbolic execution. The experimental results show that this method significantly increases both detection efficiency and accuracy.
arXiv Detail & Related papers (2023-12-24T03:27:03Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes. We find that existing training-based or zero-shot text detectors are ineffective in detecting code. Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z)
Software Vulnerability Detection via Deep Learning over Disaggregated Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora. Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable. We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts. We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)
A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens. We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.