Related papers: Detecting Unknown DGAs without Context Information

Detecting Unknown DGAs without Context Information

URL: http://arxiv.org/abs/2205.14940v1
Date: Mon, 30 May 2022 09:08:50 GMT
Title: Detecting Unknown DGAs without Context Information
Authors: Arthur Drichel, Justus von Brandt, Ulrike Meyer
Abstract summary: New malware often incorporates Domain Generation Algorithms (DGAs) to avoid blocking the malware's connection to the command and control (C2) server. Current state-of-the-art classifiers are able to separate benign from malicious domains (binary classification) and attribute them with high probability to the DGAs that generated them (multiclass classification) While binary classifiers can label domains of yet unknown DGAs as malicious, multiclass classifiers can only assign domains to DGAs that are known at the time of training, limiting the ability to uncover new malware families.
Score: 3.8424737607413153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: New malware emerges at a rapid pace and often incorporates Domain Generation Algorithms (DGAs) to avoid blocking the malware's connection to the command and control (C2) server. Current state-of-the-art classifiers are able to separate benign from malicious domains (binary classification) and attribute them with high probability to the DGAs that generated them (multiclass classification). While binary classifiers can label domains of yet unknown DGAs as malicious, multiclass classifiers can only assign domains to DGAs that are known at the time of training, limiting the ability to uncover new malware families. In this work, we perform a comprehensive study on the detection of new DGAs, which includes an evaluation of 59,690 classifiers. We examine four different approaches in 15 different configurations and propose a simple yet effective approach based on the combination of a softmax classifier and regular expressions (regexes) to detect multiple unknown DGAs with high probability. At the same time, our approach retains state-of-the-art classification performance for known DGAs. Our evaluation is based on a leave-one-group-out cross-validation with a total of 94 DGA families. By using the maximum number of known DGAs, our evaluation scenario is particularly difficult and close to the real world. All of the approaches examined are privacy-preserving, since they operate without context and exclusively on a single domain to be classified. We round up our study with a thorough discussion of class-incremental learning strategies that can adapt an existing classifier to newly discovered classes.

Related papers

Towards Robust Domain Generation Algorithm Classification [1.4542411354617986]
We implement 32 white-box attacks, 19 of which are very effective and induce a false-negative rate (FNR) of $approx$ 100% on unhardened classifiers. We propose a novel training scheme that leverages adversarial latent space vectors and discretized adversarial domains to significantly improve robustness.
arXiv Detail & Related papers (2024-04-09T11:56:29Z)
Activate and Reject: Towards Safe Domain Generalization under Category Shift [71.95548187205736]
We study a practical problem of Domain Generalization under Category Shift (DGCS) It aims to simultaneously detect unknown-class samples and classify known-class samples in the target domains. Compared to prior DG works, we face two new challenges: 1) how to learn the concept of unknown'' during training with only source known-class samples, and 2) how to adapt the source-trained model to unseen environments.
arXiv Detail & Related papers (2023-10-07T07:53:12Z)
Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment [59.831917206058435]
Domain adaptive detection aims to improve the generalization of detectors on target domain. Recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. We introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning.
arXiv Detail & Related papers (2023-01-01T08:38:07Z)
Prior Knowledge Guided Unsupervised Domain Adaptation [82.9977759320565]
We propose a Knowledge-guided Unsupervised Domain Adaptation (KUDA) setting where prior knowledge about the target class distribution is available. In particular, we consider two specific types of prior knowledge about the class distribution in the target domain: Unary Bound and Binary Relationship. We propose a rectification module that uses such prior knowledge to refine model generated pseudo labels.
arXiv Detail & Related papers (2022-07-18T18:41:36Z)
First Step Towards EXPLAINable DGA Multiclass Classification [0.6767885381740952]
Malware families rely on domain generation algorithms (DGAs) to establish a connection to their command and control (C2) server. In this paper, we propose EXPLAIN, a feature-based and contextless DGA multiclass classifier.
arXiv Detail & Related papers (2021-06-23T12:05:13Z)
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency [93.89773386634717]
Visual domain adaptation involves learning to classify images from a target visual domain using labels available in a different source domain. We show that in the presence of a few target labels, simple techniques like self-supervision (via rotation prediction) and consistency regularization can be effective without any adversarial alignment to learn a good target classifier. Our Pretraining and Consistency (PAC) approach, can achieve state of the art accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets.
arXiv Detail & Related papers (2021-01-29T18:40:17Z)
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification. Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z)
Making Use of NXt to Nothing: The Effect of Class Imbalances on DGA Detection Classifiers [3.0969191504482243]
It is unclear whether the inclusion of DGAs for which only a few samples are known to the training sets is beneficial or harmful to the overall performance of the classifiers. In this paper, we perform a comprehensive analysis of various contextless DGA classifiers, which reveals the high value of a few training samples per class for both classification tasks.
arXiv Detail & Related papers (2020-07-01T07:51:12Z)
Analyzing the Real-World Applicability of DGA Classifiers [3.0969191504482243]
We propose a novel classifier for separating benign domains from domains generated by DGAs. We evaluate their classification performance and compare them with respect to explainability, robustness, and training and classification speed. Our newly proposed binary classifier generalizes well to other networks, is time-robust, and able to identify previously unknown DGAs.
arXiv Detail & Related papers (2020-06-19T12:34:05Z)
Inline Detection of DGA Domains Using Side Information [5.253305460558346]
Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names. In recent years, machine learning based systems have been widely used to detect DGAs. We train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself.
arXiv Detail & Related papers (2020-03-12T11:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.