Improving DGA-Based Malicious Domain Classifiers for Malware Defense
with Adversarial Machine Learning
- URL: http://arxiv.org/abs/2101.00521v1
- Date: Sat, 2 Jan 2021 22:04:22 GMT
- Title: Improving DGA-Based Malicious Domain Classifiers for Malware Defense
with Adversarial Machine Learning
- Authors: Ibrahim Yilmaz, Ambareen Siraj, Denis Ulybyshev
- Abstract summary: Domain Generation Algorithms (DGAs) are used by adversaries to establish Command and Control (C&C) server communications during cyber attacks.
Blacklists of known/identified C&C domains are often used as one of the defense mechanisms.
We propose a new method using adversarial machine learning to generate never-before-seen malware-related domain families.
- Score: 0.9023847175654603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Domain Generation Algorithms (DGAs) are used by adversaries to establish
Command and Control (C\&C) server communications during cyber attacks.
Blacklists of known/identified C\&C domains are often used as one of the
defense mechanisms. However, since blacklists are static and generated by
signature-based approaches, they can neither keep up nor detect
never-seen-before malicious domain names. Due to this shortcoming of blacklist
domain checking, machine learning algorithms have been used to address the
problem to some extent. However, when training is performed with limited
datasets, the algorithms are likely to fail in detecting new DGA variants. To
mitigate this weakness, we successfully applied a DGA-based malicious domain
classifier using the Long Short-Term Memory (LSTM) method with a novel feature
engineering technique. Our model's performance shows a higher level of accuracy
compared to a previously reported model from prior research. Additionally, we
propose a new method using adversarial machine learning to generate
never-before-seen malware-related domain families that can be used to
illustrate the shortcomings of machine learning algorithms in this regard.
Next, we augment the training dataset with new samples such that it makes
training of the machine learning models more effective in detecting
never-before-seen malicious domain name variants. Finally, to protect
blacklists of malicious domain names from disclosure and tampering, we devise
secure data containers that store blacklists and guarantee their protection
against adversarial access and modifications.
Related papers
- MASKDROID: Robust Android Malware Detection with Masked Graph Representations [56.09270390096083]
We propose MASKDROID, a powerful detector with a strong discriminative ability to identify malware.
We introduce a masking mechanism into the Graph Neural Network based framework, forcing MASKDROID to recover the whole input graph.
This strategy enables the model to understand the malicious semantics and learn more stable representations, enhancing its robustness against adversarial attacks.
arXiv Detail & Related papers (2024-09-29T07:22:47Z) - DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification [4.585051136007553]
We introduce DomURLs_BERT, a pre-trained BERT-based encoder for detecting and classifying suspicious/malicious domains and URLs.
The proposed encoder outperforms state-of-the-art character-based deep learning models and cybersecurity-focused BERT models across multiple tasks and datasets.
arXiv Detail & Related papers (2024-09-13T18:59:13Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - Open SESAME: Fighting Botnets with Seed Reconstructions of Domain
Generation Algorithms [0.0]
Bots can generate pseudorandom domain names using Domain Generation Algorithms (DGAs)
A cyber criminal can register such domains to establish periodically changing rendezvous points with the bots.
We introduce SESAME, a system that combines the two above-mentioned approaches and contains a module for automatic Seed Reconstruction.
arXiv Detail & Related papers (2023-01-12T14:25:31Z) - Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples [128.25509832644025]
There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet.
UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models.
We present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations.
arXiv Detail & Related papers (2022-12-31T04:26:25Z) - Explaining Machine Learning DGA Detectors from DNS Traffic Data [11.049278217301048]
This work addresses the problem of Explainable ML in the context of botnet and DGA detection.
It is the first to concretely break down the decisions of ML classifiers when devised for botnet/DGA detection.
arXiv Detail & Related papers (2022-08-10T11:34:26Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Adversarial Machine Learning Attacks and Defense Methods in the Cyber
Security Domain [58.30296637276011]
This paper summarizes the latest research on adversarial attacks against security solutions based on machine learning techniques.
It is the first to discuss the unique challenges of implementing end-to-end adversarial attacks in the cyber security domain.
arXiv Detail & Related papers (2020-07-05T18:22:40Z) - Real-Time Detection of Dictionary DGA Network Traffic using Deep
Learning [5.915780927888678]
Botnets and malware avoid detection by static rules engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses.
Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains.
We create a novel hybrid neural network, Bilbo the bagging model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious.
arXiv Detail & Related papers (2020-03-28T14:57:22Z) - Inline Detection of DGA Domains Using Side Information [5.253305460558346]
Domain Generation Algorithms (DGAs) are popular methods for generating pseudo-random domain names.
In recent years, machine learning based systems have been widely used to detect DGAs.
We train and evaluate state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself.
arXiv Detail & Related papers (2020-03-12T11:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.