Related papers: Malicious Code Detection: Run Trace Output Analysis by LSTM

Malicious Code Detection: Run Trace Output Analysis by LSTM

URL: http://arxiv.org/abs/2101.05646v1
Date: Thu, 14 Jan 2021 15:00:42 GMT
Title: Malicious Code Detection: Run Trace Output Analysis by LSTM
Authors: Cengiz Acarturk, Melih Sirlanci, Pinar Gurkan Balikcioglu, Deniz Demirci, Nazenin Sahin, Ozge Acar Kucuk
Abstract summary: We propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM) We created our dataset from run trace outputs obtained from dynamic analysis of PE files. Experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Malicious software threats and their detection have been gaining importance as a subdomain of information security due to the expansion of ICT applications in daily settings. A major challenge in designing and developing anti-malware systems is the coverage of the detection, particularly the development of dynamic analysis methods that can detect polymorphic and metamorphic malware efficiently. In the present study, we propose a methodological framework for detecting malicious code by analyzing run trace outputs by Long Short-Term Memory (LSTM). We developed models of run traces of malicious and benign Portable Executable (PE) files. We created our dataset from run trace outputs obtained from dynamic analysis of PE files. The obtained dataset was in the instruction format as a sequence and was called Instruction as a Sequence Model (ISM). By splitting the first dataset into basic blocks, we obtained the second one called Basic Block as a Sequence Model (BSM). The experiments showed that the ISM achieved an accuracy of 87.51% and a false positive rate of 18.34%, while BSM achieved an accuracy of 99.26% and a false positive rate of 2.62%.

Related papers

Probing Deep into Temporal Profile Makes the Infrared Small Target Detector Much Better [63.567886330598945]
Infrared small target (IRST) detection is challenging in simultaneously achieving precise, universal, robust and efficient performance.<n>Current learning-based methods attempt to leverage more" information from both the spatial and the short-term temporal domains.<n>We propose an efficient deep temporal probe network (DeepPro) that only performs calculations in the time dimension for IRST detection.
arXiv Detail & Related papers (2025-06-15T08:19:32Z)
Enhanced Consistency Bi-directional GAN(CBiGAN) for Malware Anomaly Detection [0.25163931116642785]
This paper introduces the application of the CBiGAN in the domain of malware anomaly detection.<n>We utilize several datasets including both portable executable (PE) files as well as Object Linking and Embedding (OLE) files.<n>We then evaluated our model against a diverse set of both PE and OLE files, including self-collected malicious executables from 214 malware families.
arXiv Detail & Related papers (2025-06-09T02:43:25Z)
LO2: Microservice API Anomaly Dataset of Logs and Metrics [42.61470118436856]
This dataset supports research on anomaly detection and architectural degradation in microservice systems. We generate a comprehensive dataset of logs, metrics, and traces from a production microservice system.
arXiv Detail & Related papers (2025-04-16T13:21:56Z)
DWFS-Obfuscation: Dynamic Weighted Feature Selection for Robust Malware Familial Classification under Obfuscation [1.4742348415878401]
We propose a dynamic weighted feature selection method that analyzes the importance and stability of features. We then utilize graph neural networks for classification, thereby improving the robustness and accuracy of the detection system. Experiments demonstrate that our proposed method achieves an F1-score of 95.56% on the unobfuscated dataset and 92.28% on the obfuscated dataset.
arXiv Detail & Related papers (2025-04-10T09:37:43Z)
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection [2.5228276786940182]
This paper introduces CASTLE, a benchmarking framework for evaluating the vulnerability detection capabilities of different methods. We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs.
arXiv Detail & Related papers (2025-03-12T14:30:05Z)
$\textit{X}^2$-DFD: A framework for e${X}$plainable and e${X}$tendable Deepfake Detection [52.14468236527728]
We propose a novel framework called $X2$-DFD, consisting of three core modules. The first module, Model Feature Assessment (MFA), measures the detection capabilities of forgery features intrinsic to MLLMs, and gives a descending ranking of these features. The second module, Strong Feature Strengthening (SFS), enhances the detection and explanation capabilities by fine-tuning the MLLM on a dataset constructed based on the top-ranked features. The third module, Weak Feature Supplementing (WFS), improves the fine-tuned MLLM's capabilities on lower-ranked features by integrating external dedicated
arXiv Detail & Related papers (2024-10-08T15:28:33Z)
SLIFER: Investigating Performance and Robustness of Malware Detection Pipelines [12.940071285118451]
academia focuses on combining static and dynamic analysis within a single or ensemble of models. In this paper, we investigate the properties of malware detectors built with multiple and different types of analysis. As far as we know, we are the first to investigate the properties of sequential malware detectors, shedding light on their behavior in real production environment.
arXiv Detail & Related papers (2024-05-23T12:06:10Z)
Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection. It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z)
FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs [54.27040631527217]
We propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. We first build a binary large language model (FoC-BinLLM) to summarize the semantics of cryptographic functions in natural language. We then build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database.
arXiv Detail & Related papers (2024-03-27T09:45:33Z)
Shifting the Lens: Detecting Malicious npm Packages using Large Language Models [4.479741014073169]
Existing malicious code detection techniques often suffer from high misclassification rates. We present SecurityAI, a malicious code review workflow to detect malicious code using ChatGPT. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores.
arXiv Detail & Related papers (2024-03-18T19:10:12Z)
Machine learning-based network intrusion detection for big and imbalanced data using oversampling, stacking feature embedding and feature extraction [6.374540518226326]
Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities. This paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding (PCA) for dimension reduction. Using the CIC-IDS 2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS 2018 dataset.
arXiv Detail & Related papers (2024-01-22T05:49:41Z)
Discovering Malicious Signatures in Software from Structural Interactions [7.06449725392051]
We propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science. Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network. Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution.
arXiv Detail & Related papers (2023-12-19T23:42:20Z)
Malicious code detection in android: the role of sequence characteristics and disassembling methods [0.0]
We investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers. Our findings exhibit that the disassembly method and different input representations affect the model results.
arXiv Detail & Related papers (2023-12-02T11:55:05Z)
MAPS: A Noise-Robust Progressive Learning Approach for Source-Free Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation. This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z)
Analyzing Modality Robustness in Multimodal Sentiment Analysis [48.52878002917685]
Building robust multimodal models is crucial for achieving reliable deployment in the wild. We propose simple diagnostic checks for modality robustness in a trained multimodal model. We analyze well-known robust training strategies to alleviate the issues.
arXiv Detail & Related papers (2022-05-30T23:30:16Z)
Towards an Automated Pipeline for Detecting and Classifying Malware through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs) Given an input PE sample, it is first classified as either malicious or benign. If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z)
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models [59.04636530383049]
Anomalies or failures in large computer systems, such as the cloud, have an impact on a large number of users. We propose a framework for anomaly detection in log data, as a major troubleshooting source of system information.
arXiv Detail & Related papers (2021-02-23T09:17:05Z)
Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records. Existing approaches rely on log-specifics or manual rule extraction. We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.