Detecting malicious PDF using CNN
- URL: http://arxiv.org/abs/2007.12729v2
- Date: Sun, 2 Aug 2020 10:15:50 GMT
- Title: Detecting malicious PDF using CNN
- Authors: Raphael Fettaya and Yishay Mansour
- Abstract summary: Malicious PDF files represent one of the biggest threats to computer security.
We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file.
We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
- Score: 46.86114958340962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Malicious PDF files represent one of the biggest threats to computer
security. To detect them, significant research has been done using handwritten
signatures or machine learning based on manual feature extraction. Those
approaches are both time-consuming, require significant prior knowledge and the
list of features has to be updated with each newly discovered vulnerability. In
this work, we propose a novel algorithm that uses an ensemble of Convolutional
Neural Network (CNN) on the byte level of the file, without any handcrafted
features. We show, using a data set of 90000 files downloadable online, that
our approach maintains a high detection rate (94%) of PDF malware and even
detects new malicious files, still undetected by most antiviruses. Using
automatically generated features from our CNN network, and applying a
clustering algorithm, we also obtain high similarity between the antiviruses'
labels and the resulting clusters.
Related papers
- Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach [0.0]
Deep Packet Inspection (DPI) has emerged as a key technology in strengthening network security.
This study proposes a novel approach that leverages a large language model (LLM) and few-shot learning.
Our approach shows promising results with an average accuracy of 86.35% and F1-Score of 86.40% on different malware types.
arXiv Detail & Related papers (2024-09-17T15:02:32Z) - Online Clustering of Known and Emerging Malware Families [1.2289361708127875]
It is essential to categorize malware samples according to their malicious characteristics.
Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats.
This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families.
arXiv Detail & Related papers (2024-05-06T09:20:17Z) - A Feature Set of Small Size for the PDF Malware Detection [8.282177703075451]
We propose a small features set that don't require too much domain knowledge of the PDF file.
We report the best accuracy of 99.75% when using Random Forest model.
Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
arXiv Detail & Related papers (2023-08-09T04:51:28Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Adversarial Networks and Machine Learning for File Classification [0.0]
Correctly identifying the type of file under examination is a critical part of a forensic investigation.
We propose using an adversarially-trained machine learning neural network to determine a file's true type.
Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types.
arXiv Detail & Related papers (2023-01-27T19:40:03Z) - HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and
Statistical Analysis [16.224649756613655]
Malicious PDF documents present a serious threat to various security organizations.
State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware.
In this paper, we derive a simple yet effective holistic approach to PDF malware detection.
arXiv Detail & Related papers (2021-11-08T18:32:47Z) - Reversible Watermarking in Deep Convolutional Neural Networks for
Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication.
The influence of embedding reversible watermarking on the classification performance is less than 0.5%.
At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Noise-Response Analysis of Deep Neural Networks Quantifies Robustness
and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways)
It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data)
Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present)
Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z) - Automating Botnet Detection with Graph Neural Networks [106.24877728212546]
Botnets are now a major source for many network attacks, such as DDoS attacks and spam.
In this paper, we consider the neural network design challenges of using modern deep learning techniques to learn policies for botnet detection automatically.
arXiv Detail & Related papers (2020-03-13T15:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.