Related papers: COBRA: Interaction-Aware Bytecode-Level Vulnerability Detector for Smart Contracts

COBRA: Interaction-Aware Bytecode-Level Vulnerability Detector for Smart Contracts

URL: http://arxiv.org/abs/2410.20712v1
Date: Mon, 28 Oct 2024 03:55:09 GMT
Title: COBRA: Interaction-Aware Bytecode-Level Vulnerability Detector for Smart Contracts
Authors: Wenkai Li, Xiaoqi Li, Zongwei Li, Yuqing Zhang,
Abstract summary: We propose COBRA, a framework that integrates semantic context and function interfaces to detect vulnerabilities in smart contracts. To infer the function signatures that are not present in signature databases, we present SRIF, which automatically learns the rules of function signatures from the smart contract bytecodes. Experimental results demonstrate that SRIF can achieve 94.76% F1-score for function signature inference.
Score: 4.891180928768215
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The detection of vulnerabilities in smart contracts remains a significant challenge. While numerous tools are available for analyzing smart contracts in source code, only about 1.79% of smart contracts on Ethereum are open-source. For existing tools that target bytecodes, most of them only consider the semantic logic context and disregard function interface information in the bytecodes. In this paper, we propose COBRA, a novel framework that integrates semantic context and function interfaces to detect vulnerabilities in bytecodes of the smart contract. To our best knowledge, COBRA is the first framework that combines these two features. Moreover, to infer the function signatures that are not present in signature databases, we present SRIF (Signatures Reverse Inference from Functions), automatically learn the rules of function signatures from the smart contract bytecodes. The bytecodes associated with the function signatures are collected by constructing a control flow graph (CFG) for the SRIF training. We optimize the semantic context using the operation code in the static single assignment (SSA) format. Finally, we integrate the context and function interface representations in the latent space as the contract feature embedding. The contract features in the hidden space are decoded for vulnerability classifications with a decoder and attention module. Experimental results demonstrate that SRIF can achieve 94.76% F1-score for function signature inference. Furthermore, when the ground truth ABI exists, COBRA achieves 93.45% F1-score for vulnerability classification. In the absence of ABI, the inferred function feature fills the encoder, and the system accomplishes an 89.46% recall rate.

Related papers

SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode [0.7018579932647147]
This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis.
arXiv Detail & Related papers (2025-04-07T12:30:12Z)
Automating Comment Generation for Smart Contract from Bytecode [11.143538294203026]
In practice, only 13% of smart contracts deployed on the component to the blockchain are associated with source code. We propose SmartBT (Smart contract Bytecode Translator) for automatically translating smart contract bytecode into fine-grained natural language description.
arXiv Detail & Related papers (2025-03-19T14:45:40Z)
SolBench: A Dataset and Benchmark for Evaluating Functional Correctness in Solidity Code Completion and Repair [51.0686873716938]
We introduce SolBench, a benchmark for evaluating the functional correctness of Solidity smart contracts generated by code completion models. We propose a Retrieval-Augmented Code Repair framework to verify functional correctness of smart contracts. Results show that code repair and retrieval techniques effectively enhance the correctness of smart contract completion while reducing computational costs.
arXiv Detail & Related papers (2025-03-03T01:55:20Z)
Analyzing the Impact of Copying-and-Pasting Vulnerable Solidity Code Snippets from Question-and-Answer Websites [3.844857617939819]
We conduct a study on the impact of vulnerable code reuse from Q&A websites during the development of smart contracts. This paper proposes a pattern-based vulnerability detection tool that is able to analyze code snippets (i.e., incomplete code) as well as full smart contracts. Our results show that our vulnerability search, as well as our code clone detection, are comparable to state-of-the-art while being applicable to code snippets.
arXiv Detail & Related papers (2024-09-11T19:33:27Z)
Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow [34.79673982473015]
We introduce SOChecker, a tool to identify potential vulnerabilities in incomplete SO smart contract code snippets. Results show that SOChecker achieves an F1 score of 68.2%, greatly surpassing GPT-3.5 and GPT-4. Our findings underscore the need to improve the security of code snippets from Q&A websites.
arXiv Detail & Related papers (2024-07-18T08:25:16Z)
Effective Targeted Testing of Smart Contracts [0.0]
Since smart contracts are immutable, their bugs cannot be fixed, which may lead to significant monetary losses. Our framework, Griffin, tackles this deficiency by employing a targeted symbolic execution technique for generating test data. This paper discusses how smart contracts differ from legacy software in targeted symbolic execution and how these differences can affect the tool structure.
arXiv Detail & Related papers (2024-07-05T04:38:11Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
SigRec: Automatic Recovery of Function Signatures in Smart Contracts [40.20115707680234]
It is challenging to recover function signatures from contract bytecode, since neither debug information nor type information is present in the bytecode. We develop SigRec, a new tool for recovering function signatures from contract bytecode without the need of source code and function signature databases.
arXiv Detail & Related papers (2023-05-11T18:03:39Z)
SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings [2.1222884030559315]
We propose SimCLF: A Simple Contrastive Learning Framework for Function-level Binary Embeddings. We take an unsupervised learning approach and formulate binary code similarity detection as instance discrimination. SimCLF directly operates on disassembled binary functions and could be implemented with any encoder.
arXiv Detail & Related papers (2022-09-06T12:09:45Z)
Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z)
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable. We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts. We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)
Contrastive Code Representation Learning [95.86686147053958]
We show that the popular reconstruction-based BERT model is sensitive to source code edits, even when the edits preserve semantics. We propose ContraCode: a contrastive pre-training task that learns code functionality, not form.
arXiv Detail & Related papers (2020-07-09T17:59:06Z)
Robust Encodings: A Framework for Combating Adversarial Typos [85.70270979772388]
NLP systems are easily fooled by small perturbations of inputs. Existing procedures to defend against such perturbations provide guaranteed robustness to worst-case attacks. We introduce robust encodings (RobEn) that confer guaranteed robustness without making compromises on model architecture.
arXiv Detail & Related papers (2020-05-04T01:28:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.