Malicious Source Code Detection Using Transformer
- URL: http://arxiv.org/abs/2209.07957v1
- Date: Fri, 16 Sep 2022 14:16:50 GMT
- Title: Malicious Source Code Detection Using Transformer
- Authors: Chen Tsfaty, Michael Fire
- Abstract summary: We introduce Malicious Source code Detection using Transformers (MSDT) algorithm.
MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages.
Our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Open source code is considered a common practice in modern software
development. However, reusing other code allows bad actors to access a wide
developers' community, hence the products that rely on it. Those attacks are
categorized as supply chain attacks. Recent years saw a growing number of
supply chain attacks that leverage open source during software development,
relaying the download and installation procedures, whether automatic or manual.
Over the years, many approaches have been invented for detecting vulnerable
packages. However, it is uncommon to detect malicious code within packages.
Those detection approaches can be broadly categorized as analyzes that use
(dynamic) and do not use (static) code execution. Here, we introduce Malicious
Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel
static analysis based on a deep learning method that detects real-world code
injection cases to source code packages. In this study, we used MSDT and a
dataset with over 600,000 different functions to embed various functions and
applied a clustering algorithm to the resulting vectors, detecting the
malicious functions by detecting the outliers. We evaluated MSDT's performance
by conducting extensive experiments and demonstrated that our algorithm is
capable of detecting functions that were injected with malicious code with
precision@k values of up to 0.909.
Related papers
- RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.
We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Patch2QL: Discover Cognate Defects in Open Source Software Supply Chain
With Auto-generated Static Analysis Rules [1.9591497166224197]
We propose a novel technique for detecting cognate defects in OSS through the automatic generation of SAST rules.
Specifically, it extracts key syntax and semantic information from pre- and post-patch versions of code.
We have implemented a prototype tool called Patch2QL and applied it to fundamental OSS in C/C++.
arXiv Detail & Related papers (2024-01-23T02:23:11Z) - Source Code Clone Detection Using Unsupervised Similarity Measures [0.0]
This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection.
The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses.
arXiv Detail & Related papers (2024-01-18T10:56:27Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - VMCDL: Vulnerability Mining Based on Cascaded Deep Learning Under Source
Control Flow [2.561778620560749]
This paper mainly use the c/c++ source code data of the SARD dataset, process the source code of CWE476, CWE469, CWE516 and CWE570 vulnerability types.
We propose a new cascading deep learning model VMCDL based on source code control flow to effectively detect vulnerabilities.
arXiv Detail & Related papers (2023-03-13T13:58:39Z) - A Hierarchical Deep Neural Network for Detecting Lines of Codes with
Vulnerabilities [6.09170287691728]
Software vulnerabilities, caused by unintentional flaws in source codes, are the main root cause of cyberattacks.
We propose a deep learning approach to detect vulnerabilities from their LLVM IR representations based on the techniques that have been used in natural language processing.
arXiv Detail & Related papers (2022-11-15T21:21:27Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.