SecureReg: Combining NLP and MLP for Enhanced Detection of Malicious Domain Name Registrations
- URL: http://arxiv.org/abs/2401.03196v3
- Date: Wed, 10 Jul 2024 11:17:50 GMT
- Title: SecureReg: Combining NLP and MLP for Enhanced Detection of Malicious Domain Name Registrations
- Authors: Furkan Çolhak, Mert İlhan Ecevit, Hasan Dağ, Reiner Creutzburg,
- Abstract summary: This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process.
The proposed system analyzes semantic and numerical attributes by leveraging a novel combination of Natural Language Processing (NLP) techniques.
With an F1 score of 84.86% and an accuracy of 84.95% on the SecureReg dataset, it effectively detects malicious domain registrations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The escalating landscape of cyber threats, characterized by the registration of thousands of new domains daily for large-scale Internet attacks such as spam, phishing, and drive-by downloads, underscores the imperative for innovative detection methodologies. This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process. The accompanying data pipeline generates crucial features by comparing new domains to registered domains, emphasizing the crucial similarity score. The proposed system analyzes semantic and numerical attributes by leveraging a novel combination of Natural Language Processing (NLP) techniques, including a pretrained CANINE model and Multilayer Perceptron (MLP) models, providing a robust solution for early threat detection. This integrated Pretrained NLP (CANINE) + MLP model showcases the outstanding performance, surpassing both individual pretrained NLP models and standalone MLP models. With an F1 score of 84.86\% and an accuracy of 84.95\% on the SecureReg dataset, it effectively detects malicious domain registrations. The findings demonstrate the effectiveness of the integrated approach and contribute to the ongoing efforts to develop proactive strategies to mitigate the risks associated with illicit online activities through the early identification of suspicious domain registrations.
Related papers
- Training Large Language Models for Advanced Typosquatting Detection [0.0]
Typosquatting is a cyber threat that exploits human error in typing URLs to deceive users, distribute malware, and conduct phishing attacks.
This study introduces a novel approach leveraging large language models (LLMs) to enhance typosquatting detection.
Experimental results indicate that the Phi-4 14B model outperformed other tested models when properly fine tuned achieving a 98% accuracy rate with only a few thousand training samples.
arXiv Detail & Related papers (2025-03-28T13:16:27Z) - Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.
Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.
Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z) - NLP-ADBench: NLP Anomaly Detection Benchmark [9.445800367013744]
We introduce NLP-ADBench, the most comprehensive benchmark for NLP anomaly detection.
No single model excels across all datasets, highlighting the need for automated model selection.
Two-step methods leveraging transformer-based embeddings consistently outperform specialized end-to-end approaches.
arXiv Detail & Related papers (2024-12-06T05:30:41Z) - Palisade -- Prompt Injection Detection Framework [0.9620910657090188]
Large Language Models are vulnerable to malicious prompt injection attacks.
This paper proposes a novel NLP based approach for prompt injection detection.
It emphasizes accuracy and optimization through a layered input screening process.
arXiv Detail & Related papers (2024-10-28T15:47:03Z) - DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection [2.6217304977339473]
Domain squatting poses a significant threat to Internet security, with attackers employing increasingly sophisticated techniques.
This study introduces DomainLynx, an innovative compound AI system leveraging Large Language Models (LLMs) for enhanced domain squatting detection.
In a month-long real-world test, it detected 34,359 squatting domains from 2.09 million new domains, outperforming baseline methods by 2.5 times.
arXiv Detail & Related papers (2024-10-02T23:32:09Z) - An Effective Networks Intrusion Detection Approach Based on Hybrid
Harris Hawks and Multi-Layer Perceptron [47.81867479735455]
This paper proposes an Intrusion Detection System (IDS) employing the Harris Hawks Optimization (HHO) to optimize Multilayer Perceptron learning.
HHO-MLP aims to select optimal parameters in its learning process to minimize intrusion detection errors in networks.
HHO-MLP showed superior performance by attaining top scores with accuracy rate of 93.17%, sensitivity level of 95.41%, and specificity percentage of 95.41%.
arXiv Detail & Related papers (2024-02-21T06:25:50Z) - Phishing Website Detection through Multi-Model Analysis of HTML Content [0.0]
This study addresses the pressing issue of phishing by introducing an advanced detection model that meticulously focuses on HTML content.
Our proposed approach integrates a specialized Multi-Layer Perceptron (MLP) model for structured tabular data and two pretrained Natural Language Processing (NLP) models for analyzing textual features.
The fusion of two NLP and one model,termed MultiText-LP, achieves impressive results, yielding a 96.80 F1 score and a 97.18 accuracy score on our research dataset.
arXiv Detail & Related papers (2024-01-09T21:08:13Z) - NODLINK: An Online System for Fine-Grained APT Attack Detection and Investigation [15.803901489811318]
NodLink is the first online detection system that maintains high detection accuracy without sacrificing detection granularity.
We propose a novel design of in-memory cache, an efficient attack screening method, and a new approximation algorithm that is more efficient than the conventional one in APT attack detection.
arXiv Detail & Related papers (2023-11-04T05:36:59Z) - SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recognition [9.673505408890435]
Graph networks (GCNs) have achieved remarkable performance in skeleton-based action recognition.
Previous GCN-based methods rely on elaborate human priors excessively and construct complex feature aggregation mechanisms.
We propose a novel model, SiT-MLP, for skeleton-based action recognition in this work.
arXiv Detail & Related papers (2023-08-30T13:20:54Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Model-tuning Via Prompts Makes NLP Models Adversarially Robust [97.02353907677703]
We show surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP)
MVP improves performance against adversarial substitutions by an average of 8% over standard methods.
We also conduct ablations to investigate the mechanism underlying these gains.
arXiv Detail & Related papers (2023-03-13T17:41:57Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - DeepGMR: Learning Latent Gaussian Mixture Models for Registration [113.74060941036664]
Point cloud registration is a fundamental problem in 3D computer vision, graphics and robotics.
In this paper, we introduce Deep Gaussian Mixture Registration (DeepGMR), the first learning-based registration method.
Our proposed method shows favorable performance when compared with state-of-the-art geometry-based and learning-based registration methods.
arXiv Detail & Related papers (2020-08-20T17:25:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.