Esim: EVM Bytecode Similarity Detection Based on Stable-Semantic Graph
- URL: http://arxiv.org/abs/2511.12971v1
- Date: Mon, 17 Nov 2025 04:48:52 GMT
- Title: Esim: EVM Bytecode Similarity Detection Based on Stable-Semantic Graph
- Authors: Zhuo Chen, Gaoqiang Ji, Yiling He, Lei Wu, Yajin Zhou,
- Abstract summary: prevalent code reuse and limited open-source contributions have introduced significant challenges to the blockchain ecosystem.<n>Traditional binary similarity detection methods are typically based on instruction stream or control flow graph.<n>We propose a novel EVM bytecode representation called Stable-Semantic Graph (SSG)<n>We implement a prototype, Esim, which embeds SSG into matrices for similarity detection using a heterogeneous graph neural network.
- Score: 18.420449483065997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized finance (DeFi) is experiencing rapid expansion. However, prevalent code reuse and limited open-source contributions have introduced significant challenges to the blockchain ecosystem, including plagiarism and the propagation of vulnerable code. Consequently, an effective and accurate similarity detection method for EVM bytecode is urgently needed to identify similar contracts. Traditional binary similarity detection methods are typically based on instruction stream or control flow graph (CFG), which have limitations on EVM bytecode due to specific features like low-level EVM bytecode and heavily-reused basic blocks. Moreover, the highly-diverse Solidity Compiler (Solc) versions further complicate accurate similarity detection. Motivated by these challenges, we propose a novel EVM bytecode representation called Stable-Semantic Graph (SSG), which captures relationships between 'stable instructions' (special instructions identified by our study). Moreover, we implement a prototype, Esim, which embeds SSG into matrices for similarity detection using a heterogeneous graph neural network. Esim demonstrates high accuracy in SSG construction, achieving F1-scores of 100% for control flow and 95.16% for data flow, and its similarity detection performance reaches 96.3% AUC, surpassing traditional approaches. Our large-scale study, analyzing 2,675,573 smart contracts on six EVM-compatible chains over a one-year period, also demonstrates that Esim outperforms the SOTA tool Etherscan in vulnerability search.
Related papers
- Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection [105.14032334647932]
Machine-generated texts (MGTs) pose risks such as disinformation and phishing, highlighting the need for reliable detection.<n> Metric-based methods, which extract statistically distinguishable features of MGTs, are often more practical than complex model-based methods that are prone to overfitting.<n>We propose a Markov-informed score calibration strategy that models two relationships of context detection scores that may aid calibration.
arXiv Detail & Related papers (2026-02-08T16:06:12Z) - RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection [0.8373057326694192]
This paper presents a neuro-symbolic anomaly detection framework that combines a Graph Autoencoder with rare pattern mining.<n>Anomaly candidates are identified through deviations between observed and reconstructed graph structure.<n>We evaluate the proposed method on the DARPA Transparent Computing datasets and show that rare-pattern boosting yields substantial gains in anomaly ranking quality.
arXiv Detail & Related papers (2026-02-03T00:02:37Z) - Hybrid Cryptographic Monitoring System for Side-Channel Attack Detection on PYNQ SoCs [0.0]
AES-128 encryption is theoretically secure but vulnerable in practical deployments due to timing and fault injection attacks on embedded systems.<n>This work presents a lightweight dual-detection framework combining statistical thresholding and machine learning (ML) for real-time anomaly detection.
arXiv Detail & Related papers (2025-08-29T13:13:43Z) - ML-Enhanced AES Anomaly Detection for Real-Time Embedded Security [0.0]
We propose a comprehensive framework that enhances AES-128 encryption security through controlled anomaly injection and real-time anomaly detection.<n>We simulate timing and fault-based anomalies by injecting execution delays and ciphertext perturbations during encryption, generating labeled datasets for detection model training.<n>Our results show that ML-based detection significantly outperforms threshold-based methods in precision and recall while maintaining real-time performance on embedded hardware.
arXiv Detail & Related papers (2025-07-06T00:22:58Z) - Decompiling Smart Contracts with a Large Language Model [51.49197239479266]
Despite Etherscan's 78,047,845 smart contracts deployed on (as of May 26, 2025), a mere 767,520 ( 1%) are open source.<n>This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode.<n>We introduce a pioneering decompilation pipeline that transforms bytecode into human-readable and semantically faithful Solidity code.
arXiv Detail & Related papers (2025-06-24T13:42:59Z) - AST-Enhanced or AST-Overloaded? The Surprising Impact of Hybrid Graph Representations on Code Clone Detection [0.0]
Code clones significantly increase software maintenance costs and heighten vulnerability risks.<n>ASTs dominate deep learning-based code clone detection due to their precise syntactic structure representation.<n>Recent studies address this by enriching AST-based representations with semantic graphs.
arXiv Detail & Related papers (2025-06-17T12:35:17Z) - Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection [0.0]
We present SimilarGPT, a vulnerability identification tool for smart contract.<n>The main concept of SimilarGPT is to measure the similarity between the code under inspection and the secure code from third-party libraries.<n>We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection.
arXiv Detail & Related papers (2024-12-24T07:15:48Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR)
Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design.
The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - UniASM: Binary Code Similarity Detection without Fine-tuning [2.2329530239800035]
We propose a novel rich-semantic function representation technique to ensure the model captures the intricate nuances of binary code.<n>We introduce the first UniLM-based binary code embedding model, named UniASM, which includes two newly designed training tasks.<n>The experimental results show that UniASM outperforms the state-of-the-art (SOTA) approaches on the evaluation datasets.
arXiv Detail & Related papers (2022-10-28T14:04:57Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.