Related papers: Impact of Code Transformation on Detection of Smart Contract Vulnerabilities

Impact of Code Transformation on Detection of Smart Contract Vulnerabilities

URL: http://arxiv.org/abs/2410.21685v2
Date: Wed, 30 Oct 2024 06:23:13 GMT
Title: Impact of Code Transformation on Detection of Smart Contract Vulnerabilities
Authors: Cuong Tran Manh, Hieu Dinh Vo,
Abstract summary: This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100%.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: While smart contracts are foundational elements of blockchain applications, their inherent susceptibility to security vulnerabilities poses a significant challenge. Existing training datasets employed for vulnerability detection tools may be limited, potentially compromising their efficacy. This paper presents a method for improving the quantity and quality of smart contract vulnerability datasets and evaluates current detection methods. The approach centers around semantic-preserving code transformation, a technique that modifies the source code structure without altering its semantic meaning. The transformed code snippets are inserted into all potential locations within benign smart contract code, creating new vulnerable contract versions. This method aims to generate a wider variety of vulnerable codes, including those that can bypass detection by current analysis tools. The paper experiments evaluate the method's effectiveness using tools like Slither, Mythril, and CrossFuzz, focusing on metrics like the number of generated vulnerable samples and the false negative rate in detecting these vulnerabilities. The improved results show that many newly created vulnerabilities can bypass tools and the false reporting rate goes up to 100% and increases dataset size minimum by 2.5X.

Related papers

Automated Vulnerability Injection in Solidity Smart Contracts: A Mutation-Based Approach for Benchmark Development [2.0074256613821033]
This work evaluates whether mutation seeding can effectively inject vulnerabilities into Solidity-based smart contracts. We propose MuSe, a tool to generate vulnerable smart contracts by leveraging pattern-based mutation operators. We analyzed these vulnerable smart contracts using Slither, a static analysis tool, to determine its capacity to identify them and assess their validity.
arXiv Detail & Related papers (2025-04-22T14:46:18Z)
Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection [0.0]
We present SimilarGPT, a vulnerability identification tool for smart contract. The main concept of SimilarGPT is to measure the similarity between the code under inspection and the secure code from third-party libraries. We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection.
arXiv Detail & Related papers (2024-12-24T07:15:48Z)
HexaCoder: Secure Code Generation via Oracle-Guided Synthetic Training Data [60.75578581719921]
Large language models (LLMs) have shown great potential for automatic code generation. Recent studies highlight that many LLM-generated code contains serious security vulnerabilities. We introduce HexaCoder, a novel approach to enhance the ability of LLMs to generate secure codes.
arXiv Detail & Related papers (2024-09-10T12:01:43Z)
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions [12.706661324384319]
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. The adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++.
arXiv Detail & Related papers (2024-08-14T06:43:06Z)
Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks. Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task. FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability. FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z)
DeVAIC: A Tool for Security Assessment of AI-generated Code [5.383910843560784]
DeVAIC (Detection of Vulnerabilities in AI-generated Code) is a tool to evaluate the security of AI-generated Python code.
arXiv Detail & Related papers (2024-04-11T08:27:23Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks. Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities. This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z)
DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z)
Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks. We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Multi-context Attention Fusion Neural Network for Software Vulnerability Identification [4.05739885420409]
We propose a deep learning model that learns to detect some of the common categories of security vulnerabilities in source code efficiently. The model builds an accurate understanding of code semantics with a lot less learnable parameters. The proposed AI achieves 98.40% F1-score on specific CWEs from the benchmarked NIST SARD dataset.
arXiv Detail & Related papers (2021-04-19T11:50:36Z)
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning [80.85273827468063]
Existing machine learning-based vulnerability detection methods are limited and only inspect whether the smart contract is vulnerable. We propose ESCORT, the first Deep Neural Network (DNN)-based vulnerability detection framework for smart contracts. We show that ESCORT achieves an average F1-score of 95% on six vulnerability types and the detection time is 0.02 seconds per contract.
arXiv Detail & Related papers (2021-03-23T15:04:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.