Malicious Source Code Detection Using Transformer
- URL: http://arxiv.org/abs/2209.07957v1
- Date: Fri, 16 Sep 2022 14:16:50 GMT
- Title: Malicious Source Code Detection Using Transformer
- Authors: Chen Tsfaty, Michael Fire
- Abstract summary: We introduce Malicious Source code Detection using Transformers (MSDT) algorithm.
MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages.
Our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Open source code is considered a common practice in modern software
development. However, reusing other code allows bad actors to access a wide
developers' community, hence the products that rely on it. Those attacks are
categorized as supply chain attacks. Recent years saw a growing number of
supply chain attacks that leverage open source during software development,
relaying the download and installation procedures, whether automatic or manual.
Over the years, many approaches have been invented for detecting vulnerable
packages. However, it is uncommon to detect malicious code within packages.
Those detection approaches can be broadly categorized as analyzes that use
(dynamic) and do not use (static) code execution. Here, we introduce Malicious
Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel
static analysis based on a deep learning method that detects real-world code
injection cases to source code packages. In this study, we used MSDT and a
dataset with over 600,000 different functions to embed various functions and
applied a clustering algorithm to the resulting vectors, detecting the
malicious functions by detecting the outliers. We evaluated MSDT's performance
by conducting extensive experiments and demonstrated that our algorithm is
capable of detecting functions that were injected with malicious code with
precision@k values of up to 0.909.
Related papers
- FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score.
FoC-Sim outperforms the previous best methods with a 52% higher Recall@1.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Patch2QL: Discover Cognate Defects in Open Source Software Supply Chain
With Auto-generated Static Analysis Rules [1.9591497166224197]
We propose a novel technique for detecting cognate defects in OSS through the automatic generation of SAST rules.
Specifically, it extracts key syntax and semantic information from pre- and post-patch versions of code.
We have implemented a prototype tool called Patch2QL and applied it to fundamental OSS in C/C++.
arXiv Detail & Related papers (2024-01-23T02:23:11Z) - Source Code Clone Detection Using Unsupervised Similarity Measures [0.0]
This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection.
The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses.
arXiv Detail & Related papers (2024-01-18T10:56:27Z) - Assaying on the Robustness of Zero-Shot Machine-Generated Text Detectors [57.7003399760813]
We explore advanced Large Language Models (LLMs) and their specialized variants, contributing to this field in several ways.
We uncover a significant correlation between topics and detection performance.
These investigations shed light on the adaptability and robustness of these detection methods across diverse topics.
arXiv Detail & Related papers (2023-12-20T10:53:53Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source
Control Flow [2.561778620560749]
This paper mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types.
We propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities.
arXiv Detail & Related papers (2023-03-13T13:58:39Z) - A Hierarchical Deep Neural Network for Detecting Lines of Codes with
Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks.
We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.