Dataset Optimization Strategies for MalwareTraffic Detection
- URL: http://arxiv.org/abs/2009.11347v1
- Date: Wed, 23 Sep 2020 19:27:22 GMT
- Title: Dataset Optimization Strategies for MalwareTraffic Detection
- Authors: Ivan Letteri, Antonio Di Cecco, Giuseppe Della Penna
- Abstract summary: We propose two novel dataset optimization strategies which exploit and combine several state-of-the-art approaches.
The first approach is a feature selection technique based on mutual information measures and sensibility enhancement.
The second is a dimensional reduction technique based autoencoders.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning is rapidly becoming one of the most important technology for
malware traffic detection, since the continuous evolution of malware requires a
constant adaptation and the ability to generalize. However, network traffic
datasets are usually oversized and contain redundant and irrelevant
information, and this may dramatically increase the computational cost and
decrease the accuracy of most classifiers, with the risk to introduce further
noise.
We propose two novel dataset optimization strategies which exploit and
combine several state-of-the-art approaches in order to achieve an effective
optimization of the network traffic datasets used to train malware detectors.
The first approach is a feature selection technique based on mutual information
measures and sensibility enhancement. The second is a dimensional reduction
technique based autoencoders. Both these approaches have been experimentally
applied on the MTA-KDD'19 dataset, and the optimized results evaluated and
compared using a Multi Layer Perceptron as machine learning model for malware
detection.
Related papers
- Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter
Selection Strategy based on Sharp Asymptotic Analysis [4.178980693837599]
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset.
Some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso.
We conduct a thorough, precise study of the algorithm in a high-dimensional setting via an analysis using the replica method.
Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance.
arXiv Detail & Related papers (2024-09-26T10:20:59Z) - Towards Novel Malicious Packet Recognition: A Few-Shot Learning Approach [0.0]
Deep Packet Inspection (DPI) has emerged as a key technology in strengthening network security.
This study proposes a novel approach that leverages a large language model (LLM) and few-shot learning.
Our approach shows promising results with an average accuracy of 86.35% and F1-Score of 86.40% on different malware types.
arXiv Detail & Related papers (2024-09-17T15:02:32Z) - Systematic review, analysis, and characterisation of malicious industrial network traffic datasets for aiding Machine Learning algorithm performance testing [0.0]
This paper systematically reviews publicly available network traffic capture-based datasets.
It includes categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis.
It provides researchers with metadata that can be used to select the best dataset for their research question.
arXiv Detail & Related papers (2024-05-08T07:48:40Z) - Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection [0.0]
This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets.
By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization.
arXiv Detail & Related papers (2024-04-05T16:01:21Z) - Defect Classification in Additive Manufacturing Using CNN-Based Vision
Processing [76.72662577101988]
This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model.
This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
arXiv Detail & Related papers (2023-07-14T14:36:58Z) - A Robust and Explainable Data-Driven Anomaly Detection Approach For
Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer.
The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data.
A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - A comparative study of neural network techniques for automatic software
vulnerability detection [9.443081849443184]
Most commonly used method for detecting software vulnerabilities is static analysis.
Some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve intelligence of detection.
We have conducted extensive experiments to test the performance of the two most typical neural networks.
arXiv Detail & Related papers (2021-04-29T01:47:30Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.