Malicious Code Detection: Run Trace Output Analysis by LSTM
- URL: http://arxiv.org/abs/2101.05646v1
- Date: Thu, 14 Jan 2021 15:00:42 GMT
- Title: Malicious Code Detection: Run Trace Output Analysis by LSTM
- Authors: Cengiz Acarturk, Melih Sirlanci, Pinar Gurkan Balikcioglu, Deniz
Demirci, Nazenin Sahin, Ozge Acar Kucuk
- Abstract summary: We propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM)
We created our dataset from run trace outputs obtained from dynamic analysis of PE files.
Experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Malicious software threats and their detection have been gaining importance
as a subdomain of information security due to the expansion of ICT applications
in daily settings. A major challenge in designing and developing anti-malware
systems is the coverage of the detection, particularly the development of
dynamic analysis methods that can detect polymorphic and metamorphic malware
efficiently. In the present study, we propose a methodological framework for
detecting malicious code by analyzing run trace outputs by Long Short-Term
Memory (LSTM). We developed models of run traces of malicious and benign
Portable Executable (PE) files. We created our dataset from run trace outputs
obtained from dynamic analysis of PE files. The obtained dataset was in the
instruction format as a sequence and was called Instruction as a Sequence Model
(ISM). By splitting the first dataset into basic blocks, we obtained the second
one called Basic Block as a Sequence Model (BSM). The experiments showed that
the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%,
while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.
Related papers
- $\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$tendable Deepfake Detection [52.14468236527728]
We propose a novel framework called $X2$-DFD, consisting of three core modules.
The first module, Model Feature Assessment (MFA), measures the detection capabilities of forgery features intrinsic to MLLMs, and gives a descending ranking of these features.
The second module, Strong Feature Strengthening (SFS), enhances the detection and explanation capabilities by fine-tuning the MLLM on a dataset constructed based on the top-ranked features.
The third module, Weak Feature Supplementing (WFS), improves the fine-tuned MLLM's capabilities on lower-ranked features by integrating external dedicated
arXiv Detail & Related papers (2024-10-08T15:28:33Z) - FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries.
We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language.
We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z) - Shifting the Lens: Detecting Malicious npm Packages using Large Language Models [4.479741014073169]
Existing malicious code detection techniques often suffer from high misclassification rates.
We present SecurityAI, a malicious code review workflow to detect malicious code using ChatGPT.
Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores.
arXiv Detail & Related papers (2024-03-18T19:10:12Z) - Machine learning-based network intrusion detection for big and
imbalanced data using oversampling, stacking feature embedding and feature
extraction [6.374540518226326]
Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities.
This paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding (PCA) for dimension reduction.
Using the CIC-IDS 2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS 2018 dataset.
arXiv Detail & Related papers (2024-01-22T05:49:41Z) - Discovering Malicious Signatures in Software from Structural
Interactions [7.06449725392051]
We propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science.
Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network.
Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution.
arXiv Detail & Related papers (2023-12-19T23:42:20Z) - Malicious code detection in android: the role of sequence characteristics and disassembling methods [0.0]
We investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers.
Our findings exhibit that the disassembly method and different input representations affect the model results.
arXiv Detail & Related papers (2023-12-02T11:55:05Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Analyzing Modality Robustness in Multimodal Sentiment Analysis [48.52878002917685]
Building robust multimodal models is crucial for achieving reliable deployment in the wild.
We propose simple diagnostic checks for modality robustness in a trained multimodal model.
We analyze well-known robust training strategies to alleviate the issues.
arXiv Detail & Related papers (2022-05-30T23:30:16Z) - Towards an Automated Pipeline for Detecting and Classifying Malware
through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs)
Given an input PE sample, it is first classified as either malicious or benign.
If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z) - Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users.
We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.