Related papers: Detecting malicious PDF using CNN

Detecting malicious PDF using CNN

URL: http://arxiv.org/abs/2007.12729v2
Date: Sun, 2 Aug 2020 10:15:50 GMT
Title: Detecting malicious PDF using CNN
Authors: Raphael Fettaya and Yishay Mansour
Abstract summary: Malicious PDF files represent one of the biggest threats to computer security. We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file. We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
Score: 46.86114958340962
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Malicious PDF files represent one of the biggest threats to computer security. To detect them, significant research has been done using handwritten signatures or machine learning based on manual feature extraction. Those approaches are both time-consuming, require significant prior knowledge and the list of features has to be updated with each newly discovered vulnerability. In this work, we propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file, without any handcrafted features. We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware and even detects new malicious files, still undetected by most antiviruses. Using automatically generated features from our CNN network, and applying a clustering algorithm, we also obtain high similarity between the antiviruses' labels and the resulting clusters.

Related papers

Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks [43.26879314353337]
This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system.<n>By changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases.<n>For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate.
arXiv Detail & Related papers (2025-10-02T05:04:44Z)
One-Class Intrusion Detection with Dynamic Graphs [46.453758431767724]
Machine learning-based intrusion detection constitutes a promising approach for improving security.<n>We propose a novel intrusion detection method, TGN-SVDD, which builds upon modern dynamic graph modelling and deep anomaly detection.<n>We demonstrate its superiority over several baselines for realistic intrusion detection data and suggest a more challenging variant of the latter.
arXiv Detail & Related papers (2025-08-18T12:36:55Z)
Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach [0.0]
Deep Packet Inspection (DPI) has emerged as a key technology in strengthening network security. This study proposes a novel approach that leverages a large language model (LLM) and few-shot learning. Our approach shows promising results with an average accuracy of 86.35% and F1-Score of 86.40% on different malware types.
arXiv Detail & Related papers (2024-09-17T15:02:32Z)
Online Clustering of Known and Emerging Malware Families [1.2289361708127875]
It is essential to categorize malware samples according to their malicious characteristics. Online clustering algorithms help us to understand malware behavior and produce a quicker response to new threats. This paper introduces a novel machine learning-based model for the online clustering of malicious samples into malware families.
arXiv Detail & Related papers (2024-05-06T09:20:17Z)
A Feature Set of Small Size for the PDF Malware Detection [8.282177703075451]
We propose a small features set that don't require too much domain knowledge of the PDF file. We report the best accuracy of 99.75% when using Random Forest model. Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
arXiv Detail & Related papers (2023-08-09T04:51:28Z)
An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z)
Adversarial Networks and Machine Learning for File Classification [0.0]
Correctly identifying the type of file under examination is a critical part of a forensic investigation. We propose using an adversarially-trained machine learning neural network to determine a file's true type. Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types.
arXiv Detail & Related papers (2023-01-27T19:40:03Z)
HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis [16.224649756613655]
Malicious PDF documents present a serious threat to various security organizations. State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware. In this paper, we derive a simple yet effective holistic approach to PDF malware detection.
arXiv Detail & Related papers (2021-11-08T18:32:47Z)
Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication. The influence of embedding reversible watermarking on the classification performance is less than 0.5%. At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Noise-Response Analysis of Deep Neural Networks Quantifies Robustness and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways) It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data) Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present) Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z)
Automating Botnet Detection with Graph Neural Networks [106.24877728212546]
Botnets are now a major source for many network attacks, such as DDoS attacks and spam. In this paper, we consider the neural network design challenges of using modern deep learning techniques to learn policies for botnet detection automatically.
arXiv Detail & Related papers (2020-03-13T15:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.