Malware Traffic Classification: Evaluation of Algorithms and an
Automated Ground-truth Generation Pipeline
- URL: http://arxiv.org/abs/2010.11627v2
- Date: Sat, 7 Nov 2020 11:51:07 GMT
- Title: Malware Traffic Classification: Evaluation of Algorithms and an
Automated Ground-truth Generation Pipeline
- Authors: Syed Muhammad Kumail Raza and Juan Caballero
- Abstract summary: We propose an automated packet data-labeling pipeline to generate ground-truth data.
We explore and test different kind of clustering approaches which make use of unique and diverse set of features extracted from this observable meta-data.
- Score: 8.779666771357029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying threats in a network traffic flow which is encrypted is uniquely
challenging. On one hand it is extremely difficult to simply decrypt the
traffic due to modern encryption algorithms. On the other hand, passing such an
encrypted stream through pattern matching algorithms is useless because
encryption ensures there aren't any. Moreover, evaluating such models is also
difficult due to lack of labeled benign and malware datasets. Other approaches
have tried to tackle this problem by employing observable meta-data gathered
from the flow. We try to augment this approach by extending it to a
semi-supervised malware classification pipeline using these observable
meta-data. To this end, we explore and test different kind of clustering
approaches which make use of unique and diverse set of features extracted from
this observable meta-data. We also, propose an automated packet data-labeling
pipeline to generate ground-truth data which can serve as a base-line to
evaluate the classifiers mentioned above in particular, or any other detection
model in general.
Related papers
- Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Feature Mining for Encrypted Malicious Traffic Detection with Deep
Learning and Other Machine Learning Algorithms [7.404682407709988]
The popularity of encryption mechanisms poses a great challenge to malicious traffic detection.
Traditional detection techniques cannot work without the decryption of encrypted traffic.
In this paper, we provide an in-depth analysis of traffic features and compare different state-of-the-art traffic feature creation approaches.
We propose a novel concept for encrypted traffic feature which is specifically designed for encrypted malicious traffic analysis.
arXiv Detail & Related papers (2023-04-07T15:25:36Z) - Hard Regularization to Prevent Deep Online Clustering Collapse without
Data Augmentation [65.268245109828]
Online deep clustering refers to the joint use of a feature extraction network and a clustering model to assign cluster labels to each new data point or batch as it is processed.
While faster and more versatile than offline methods, online clustering can easily reach the collapsed solution where the encoder maps all inputs to the same point and all are put into a single cluster.
We propose a method that does not require data augmentation, and that, differently from existing methods, regularizes the hard assignments.
arXiv Detail & Related papers (2023-03-29T08:23:26Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Using Topological Data Analysis to classify Encrypted Bits [0.0]
Persistent homology is applied to generate topological features of a point cloud obtained from sets of encryptions.
We see that this machine learning pipeline is able to classify our data successfully where classical models of machine learning fail to perform the task.
arXiv Detail & Related papers (2023-01-18T09:43:00Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Unsupervised Abnormal Traffic Detection through Topological Flow
Analysis [1.933681537640272]
topological connectivity component of a malicious flow is less exploited.
We present a simple method that facilitate the use of connectivity graph features in unsupervised anomaly detection algorithms.
arXiv Detail & Related papers (2022-05-14T18:52:49Z) - Machine Learning for Encrypted Malicious Traffic Detection: Approaches,
Datasets and Comparative Study [6.267890584151111]
In post-COVID-19 environment, malicious traffic encryption is growing rapidly.
We formulate a universal framework of machine learning based encrypted malicious traffic detection techniques.
We implement and compare 10 encrypted malicious traffic detection algorithms.
arXiv Detail & Related papers (2022-03-17T14:00:55Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - DNS Covert Channel Detection via Behavioral Analysis: a Machine Learning
Approach [0.09176056742068815]
We propose an effective covert channel detection method based on the analysis of DNS network data passively extracted from a network monitoring system.
The proposed solution has been evaluated over a 15-day-long experimental session with the injection of traffic that covers the most relevant exfiltration and tunneling attacks.
arXiv Detail & Related papers (2020-10-04T13:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.