PDF-Malware: An Overview on Threats, Detection and Evasion Attacks
- URL: http://arxiv.org/abs/2107.12873v1
- Date: Tue, 27 Jul 2021 15:15:20 GMT
- Title: PDF-Malware: An Overview on Threats, Detection and Evasion Attacks
- Authors: Nicolas Fleury, Theo Dubrunquez and Ihsen Alouani
- Abstract summary: The widespread use of PDF has installed a false impression of inherent safety among benign users.
In this work, we give an overview on the PDF-malware detection problem.
- Score: 0.966840768820136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the recent years, Portable Document Format, commonly known as PDF, has
become a democratized standard for document exchange and dissemination. This
trend has been due to its characteristics such as its flexibility and
portability across platforms. The widespread use of PDF has installed a false
impression of inherent safety among benign users. However, the characteristics
of PDF motivated hackers to exploit various types of vulnerabilities, overcome
security safeguards, thereby making the PDF format one of the most efficient
malicious code attack vectors. Therefore, efficiently detecting malicious PDF
files is crucial for information security. Several analysis techniques has been
proposed in the literature, be it static or dynamic, to extract the main
features that allow the discrimination of malware files from benign ones. Since
classical analysis techniques may be limited in case of zero-days,
machine-learning based techniques have emerged recently as an automatic
PDF-malware detection method that is able to generalize from a set of training
samples. These techniques are themselves facing the challenge of evasion
attacks where a malicious PDF is transformed to look benign. In this work, we
give an overview on the PDF-malware detection problem. We give a perspective on
the new challenges and emerging solutions.
Related papers
- MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - Hiding Sensitive Information Using PDF Steganography [3.6533698604619587]
We present a novel PDF steganography algorithm based upon least-significant bit insertion into the real-valued operands of PDF stream operators.
We also provide a case study which embeds malware into a given cover PDF document.
arXiv Detail & Related papers (2024-05-01T20:54:12Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - A Feature Set of Small Size for the PDF Malware Detection [8.282177703075451]
We propose a small features set that don't require too much domain knowledge of the PDF file.
We report the best accuracy of 99.75% when using Random Forest model.
Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
arXiv Detail & Related papers (2023-08-09T04:51:28Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and
Statistical Analysis [16.224649756613655]
Malicious PDF documents present a serious threat to various security organizations.
State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware.
In this paper, we derive a simple yet effective holistic approach to PDF malware detection.
arXiv Detail & Related papers (2021-11-08T18:32:47Z) - Towards an Automated Pipeline for Detecting and Classifying Malware
through Machine Learning [0.0]
We propose a malware taxonomic classification pipeline able to classify Windows Portable Executable files (PEs)
Given an input PE sample, it is first classified as either malicious or benign.
If malicious, the pipeline further analyzes it in order to establish its threat type, family, and behavior(s)
arXiv Detail & Related papers (2021-06-10T10:07:50Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Detecting malicious PDF using CNN [46.86114958340962]
Malicious PDF files represent one of the biggest threats to computer security.
We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file.
We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
arXiv Detail & Related papers (2020-07-24T18:27:45Z) - Explanation-Guided Backdoor Poisoning Attacks Against Malware
Classifiers [12.78844634194129]
Training pipelines for machine learning based malware classification often rely on crowdsourced threat feeds.
This paper focuses on challenging "clean label" attacks where attackers do not control the sample labeling process.
We propose the use of techniques from explainable machine learning to guide the selection of relevant features and values to create effective backdoor triggers.
arXiv Detail & Related papers (2020-03-02T17:04:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.