Reliable Malware Analysis and Detection using Topology Data Analysis
- URL: http://arxiv.org/abs/2211.01535v1
- Date: Thu, 3 Nov 2022 00:46:52 GMT
- Title: Reliable Malware Analysis and Detection using Topology Data Analysis
- Authors: Lionel Nganyewou Tidjon and Foutse Khomh
- Abstract summary: Malwares are becoming more complex and they are spreading on networks targeting different infrastructures and personal-end devices.
To defend against malwares, recent work has proposed different techniques based on signatures and machine learning.
- Score: 12.031113181911627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increasingly, malwares are becoming complex and they are spreading on
networks targeting different infrastructures and personal-end devices to
collect, modify, and destroy victim information. Malware behaviors are
polymorphic, metamorphic, persistent, able to hide to bypass detectors and
adapt to new environments, and even leverage machine learning techniques to
better damage targets. Thus, it makes them difficult to analyze and detect with
traditional endpoint detection and response, intrusion detection and prevention
systems. To defend against malwares, recent work has proposed different
techniques based on signatures and machine learning. In this paper, we propose
to use an algebraic topological approach called topological-based data analysis
(TDA) to efficiently analyze and detect complex malware patterns. Next, we
compare the different TDA techniques (i.e., persistence homology, tomato, TDA
Mapper) and existing techniques (i.e., PCA, UMAP, t-SNE) using different
classifiers including random forest, decision tree, xgboost, and lightgbm. We
also propose some recommendations to deploy the best-identified models for
malware detection at scale. Results show that TDA Mapper (combined with PCA) is
better for clustering and for identifying hidden relationships between malware
clusters compared to PCA. Persistent diagrams are better to identify
overlapping malware clusters with low execution time compared to UMAP and
t-SNE. For malware detection, malware analysts can use Random Forest and
Decision Tree with t-SNE and Persistent Diagram to achieve better performance
and robustness on noised data.
Related papers
- ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection [52.228708947607636]
This paper proposes a comprehensive visual anomaly detection benchmark, textbftextitADer, which is a modular framework for new anomaly detection methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection [0.5475886285082937]
This study conducts a thorough examination of malware detection using machine learning techniques.
The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively.
arXiv Detail & Related papers (2024-03-04T17:22:43Z) - Discovering Malicious Signatures in Software from Structural
Interactions [7.06449725392051]
We propose a novel malware detection approach that leverages deep learning, mathematical techniques, and network science.
Our approach focuses on static and dynamic analysis and utilizes the Low-Level Virtual Machine (LLVM) to profile applications within a complex network.
Our approach marks a substantial improvement in malware detection, providing a notably more accurate and efficient solution.
arXiv Detail & Related papers (2023-12-19T23:42:20Z) - Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection
Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications.
We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.
Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z) - Survey of Malware Analysis through Control Flow Graph using Machine
Learning [0.0]
Traditional signature-based malware detection methods have become ineffective in detecting new and unknown malware.
One of the most promising techniques that can overcome the limitations of signature-based detection is to use control flow graphs (CFGs)
CFGs leverage the structural information of a program to represent the possible paths of execution as a graph, where nodes represent instructions and edges represent control flow dependencies.
Machine learning (ML) algorithms are being used to extract these features from CFGs and classify them as malicious or benign.
arXiv Detail & Related papers (2023-05-15T20:18:27Z) - A Survey on Malware Detection with Graph Representation Learning [0.0]
Malware detection has become a major concern due to the increasing number and complexity of malware.
In recent years, Machine Learning (ML) and notably Deep Learning (DL) achieved impressive results in malware detection by learning useful representations from data.
This paper provides an in-depth literature review to summarize and unify existing works under the common approaches and architectures.
arXiv Detail & Related papers (2023-03-28T14:27:08Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - Towards an Automated Pipeline for Detecting and Classifying Malware
through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs)
Given an input PE sample, it is first classified as either malicious or benign.
If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.