Making Use of NXt to Nothing: The Effect of Class Imbalances on DGA
Detection Classifiers
- URL: http://arxiv.org/abs/2007.00300v1
- Date: Wed, 1 Jul 2020 07:51:12 GMT
- Title: Making Use of NXt to Nothing: The Effect of Class Imbalances on DGA
Detection Classifiers
- Authors: Arthur Drichel, Ulrike Meyer, Samuel Sch\"uppen, Dominik Teubert
- Abstract summary: It is unclear whether the inclusion of DGAs for which only a few samples are known to the training sets is beneficial or harmful to the overall performance of the classifiers.
In this paper, we perform a comprehensive analysis of various contextless DGA classifiers, which reveals the high value of a few training samples per class for both classification tasks.
- Score: 3.0969191504482243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Numerous machine learning classifiers have been proposed for binary
classification of domain names as either benign or malicious, and even for
multiclass classification to identify the domain generation algorithm (DGA)
that generated a specific domain name. Both classification tasks have to deal
with the class imbalance problem of strongly varying amounts of training
samples per DGA. Currently, it is unclear whether the inclusion of DGAs for
which only a few samples are known to the training sets is beneficial or
harmful to the overall performance of the classifiers. In this paper, we
perform a comprehensive analysis of various contextless DGA classifiers, which
reveals the high value of a few training samples per class for both
classification tasks. We demonstrate that the classifiers are able to detect
various DGAs with high probability by including the underrepresented classes
which were previously hardly recognizable. Simultaneously, we show that the
classifiers' detection capabilities of well represented classes do not
decrease.
Related papers
- A Multi-Class SWAP-Test Classifier [0.0]
This work presents the first multi-class SWAP-Test classifier inspired by its binary predecessor and the use of label states in recent work.
In contrast to previous work, the number of qubits required, the measurement strategy, and the topology of the circuits used is invariant to the number of classes.
Both analytical results and numerical simulations show that this classifier is not only effective when applied to diverse classification problems but also robust to certain conditions of noise.
arXiv Detail & Related papers (2023-02-06T18:31:43Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Detecting Unknown DGAs without Context Information [3.8424737607413153]
New malware often incorporates Domain Generation Algorithms (DGAs) to avoid blocking the malware's connection to the command and control (C2) server.
Current state-of-the-art classifiers are able to separate benign from malicious domains (binary classification) and attribute them with high probability to the DGAs that generated them (multiclass classification)
While binary classifiers can label domains of yet unknown DGAs as malicious, multiclass classifiers can only assign domains to DGAs that are known at the time of training, limiting the ability to uncover new malware families.
arXiv Detail & Related papers (2022-05-30T09:08:50Z) - First Step Towards EXPLAINable DGA Multiclass Classification [0.6767885381740952]
Malware families rely on domain generation algorithms (DGAs) to establish a connection to their command and control (C2) server.
In this paper, we propose EXPLAIN, a feature-based and contextless DGA multiclass classifier.
arXiv Detail & Related papers (2021-06-23T12:05:13Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z) - Domain Adaptation with Auxiliary Target Domain-Oriented Classifier [115.39091109079622]
Domain adaptation aims to transfer knowledge from a label-rich but heterogeneous domain to a label-scare domain.
One of the most popular SSL techniques is pseudo-labeling that assigns pseudo labels for each unlabeled data.
We propose a new pseudo-labeling framework called Auxiliary Target Domain-Oriented (ATDOC)
ATDOC alleviates the bias by introducing an auxiliary classifier for target data only, to improve the quality of pseudo labels.
arXiv Detail & Related papers (2020-07-08T15:01:35Z) - Analyzing the Real-World Applicability of DGA Classifiers [3.0969191504482243]
We propose a novel classifier for separating benign domains from domains generated by DGAs.
We evaluate their classification performance and compare them with respect to explainability, robustness, and training and classification speed.
Our newly proposed binary classifier generalizes well to other networks, is time-robust, and able to identify previously unknown DGAs.
arXiv Detail & Related papers (2020-06-19T12:34:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.