Adversarial Networks and Machine Learning for File Classification
- URL: http://arxiv.org/abs/2301.11964v1
- Date: Fri, 27 Jan 2023 19:40:03 GMT
- Title: Adversarial Networks and Machine Learning for File Classification
- Authors: Ken St. Germain, Josh Angichiodo
- Abstract summary: Correctly identifying the type of file under examination is a critical part of a forensic investigation.
We propose using an adversarially-trained machine learning neural network to determine a file's true type.
Our semi-supervised generative adversarial network (SGAN) achieved 97.6% accuracy in classifying files across 11 different types.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Correctly identifying the type of file under examination is a critical part
of a forensic investigation. The file type alone suggests the embedded content,
such as a picture, video, manuscript, spreadsheet, etc. In cases where a system
owner might desire to keep their files inaccessible or file type concealed, we
propose using an adversarially-trained machine learning neural network to
determine a file's true type even if the extension or file header is obfuscated
to complicate its discovery. Our semi-supervised generative adversarial network
(SGAN) achieved 97.6% accuracy in classifying files across 11 different types.
We also compared our network against a traditional standalone neural network
and three other machine learning algorithms. The adversarially-trained network
proved to be the most precise file classifier especially in scenarios with few
supervised samples available. Our implementation of a file classifier using an
SGAN is implemented on GitHub (https://ksaintg.github.io/SGAN-File-Classier).
Related papers
- Bytes Are All You Need: Transformers Operating Directly On File Bytes [55.81123238702553]
We investigate modality-independent representation learning by performing classification on file bytes, without the need for decoding files at inference time.
Our model, ByteFormer, improves ImageNet Top-1 classification accuracy by $5%$.
We demonstrate that the same ByteFormer architecture can perform audio classification without modifications or modality-specific preprocessing.
arXiv Detail & Related papers (2023-05-31T23:18:21Z) - DOC-NAD: A Hybrid Deep One-class Classifier for Network Anomaly
Detection [0.0]
Machine Learning approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs)
Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks.
This paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples.
arXiv Detail & Related papers (2022-12-15T00:08:05Z) - Logits are predictive of network type [47.64219291655723]
It is possible to predict which deep network has generated a given logit vector with accuracy well above chance.
We utilize a number of networks on a dataset, with random weights or pretrained weights, as well as fine-tuned networks.
arXiv Detail & Related papers (2022-11-04T05:53:27Z) - Anomaly Detection via Federated Learning [3.0755847416657613]
We propose a novel anomaly detector via federated learning to detect malicious network activity on a client's server.
By using our novel min-max scalar and sampling technique, called FedSam, we determined federated learning allows the global model to learn from each client's data.
arXiv Detail & Related papers (2022-10-12T22:40:29Z) - Deep Learning for Network Traffic Classification [0.0]
Monitoring network traffic to identify content, services, and applications is an active research topic in network traffic control systems.
Previous work has identified machine learning methods that may enable application and service identification.
We propose a classification technique using an ensemble of deep learning architectures on packet, payload, and inter-arrival time sequences.
arXiv Detail & Related papers (2021-06-02T04:11:32Z) - Content-Based Textual File Type Detection at Scale [0.0]
Programming language detection is a common need in the analysis of large source code bases.
We consider the problem of accurately detecting the type of files commonly found in software code bases, based solely on textual file content.
arXiv Detail & Related papers (2021-01-21T09:08:42Z) - Detecting malicious PDF using CNN [46.86114958340962]
Malicious PDF files represent one of the biggest threats to computer security.
We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file.
We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
arXiv Detail & Related papers (2020-07-24T18:27:45Z) - Many-Class Few-Shot Learning on Multi-Granularity Class Hierarchy [57.68486382473194]
We study many-class few-shot (MCFS) problem in both supervised learning and meta-learning settings.
In this paper, we leverage the class hierarchy as a prior knowledge to train a coarse-to-fine classifier.
The model, "memory-augmented hierarchical-classification network (MahiNet)", performs coarse-to-fine classification where each coarse class can cover multiple fine classes.
arXiv Detail & Related papers (2020-06-28T01:11:34Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z) - OSLNet: Deep Small-Sample Classification with an Orthogonal Softmax
Layer [77.90012156266324]
This paper aims to find a subspace of neural networks that can facilitate a large decision margin.
We propose the Orthogonal Softmax Layer (OSL), which makes the weight vectors in the classification layer remain during both the training and test processes.
Experimental results demonstrate that the proposed OSL has better performance than the methods used for comparison on four small-sample benchmark datasets.
arXiv Detail & Related papers (2020-04-20T02:41:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.