HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and
Statistical Analysis
- URL: http://arxiv.org/abs/2111.04703v1
- Date: Mon, 8 Nov 2021 18:32:47 GMT
- Title: HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and
Statistical Analysis
- Authors: Tajuddin Manhar Mohammed, Lakshmanan Nataraj, Satish Chikkagoudar,
Shivkumar Chandrasekaran, B.S. Manjunath
- Abstract summary: Malicious PDF documents present a serious threat to various security organizations.
State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware.
In this paper, we derive a simple yet effective holistic approach to PDF malware detection.
- Score: 16.224649756613655
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Malicious PDF documents present a serious threat to various security
organizations that require modern threat intelligence platforms to effectively
analyze and characterize the identity and behavior of PDF malware.
State-of-the-art approaches use machine learning (ML) to learn features that
characterize PDF malware. However, ML models are often susceptible to evasion
attacks, in which an adversary obfuscates the malware code to avoid being
detected by an Antivirus. In this paper, we derive a simple yet effective
holistic approach to PDF malware detection that leverages signal and
statistical analysis of malware binaries. This includes combining orthogonal
feature space models from various static and dynamic malware detection methods
to enable generalized robustness when faced with code obfuscations. Using a
dataset of nearly 30,000 PDF files containing both malware and benign samples,
we show that our holistic approach maintains a high detection rate (99.92%) of
PDF malware and even detects new malicious files created by simple methods that
remove the obfuscation conducted by malware authors to hide their malware,
which are undetected by most antiviruses.
Related papers
- A Feature Set of Small Size for the PDF Malware Detection [8.282177703075451]
We propose a small features set that don't require too much domain knowledge of the PDF file.
We report the best accuracy of 99.75% when using Random Forest model.
Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
arXiv Detail & Related papers (2023-08-09T04:51:28Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Adversarial Attacks against Windows PE Malware Detection: A Survey of
the State-of-the-Art [44.975088044180374]
This paper focuses on malware with the file format of portable executable (PE) in the family of Windows operating systems, namely Windows PE malware.
We first outline the general learning framework of Windows PE malware detection based on ML/DL.
We then highlight three unique challenges of performing adversarial attacks in the context of PE malware.
arXiv Detail & Related papers (2021-12-23T02:12:43Z) - Mate! Are You Really Aware? An Explainability-Guided Testing Framework
for Robustness of Malware Detectors [49.34155921877441]
We propose an explainability-guided and model-agnostic testing framework for robustness of malware detectors.
We then use this framework to test several state-of-the-art malware detectors' abilities to detect manipulated malware.
Our findings shed light on the limitations of current malware detectors, as well as how they can be improved.
arXiv Detail & Related papers (2021-11-19T08:02:38Z) - PDF-Malware: An Overview on Threats, Detection and Evasion Attacks [0.966840768820136]
The widespread use of PDF has installed a false impression of inherent safety among benign users.
In this work, we give an overview on the PDF-malware detection problem.
arXiv Detail & Related papers (2021-07-27T15:15:20Z) - A Novel Malware Detection Mechanism based on Features Extracted from
Converted Malware Binary Images [0.22843885788439805]
We use malware binary images and then extract different features from the same and then employ different ML-classifiers on the dataset thus obtained.
We show that this technique is successful in differentiating classes of malware based on the features extracted.
arXiv Detail & Related papers (2021-04-14T06:55:52Z) - Binary Black-box Evasion Attacks Against Deep Learning-based Static
Malware Detectors with Adversarial Byte-Level Language Model [11.701290164823142]
MalRNN is a novel approach to automatically generate evasive malware variants without restrictions.
MalRNN effectively evades three recent deep learning-based malware detectors and outperforms current benchmark methods.
arXiv Detail & Related papers (2020-12-14T22:54:53Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Detecting malicious PDF using CNN [46.86114958340962]
Malicious PDF files represent one of the biggest threats to computer security.
We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file.
We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
arXiv Detail & Related papers (2020-07-24T18:27:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.