PhishReplicant: A Language Model-based Approach to Detect Generated Squatting Domain Names
- URL: http://arxiv.org/abs/2310.11763v1
- Date: Wed, 18 Oct 2023 07:41:41 GMT
- Title: PhishReplicant: A Language Model-based Approach to Detect Generated Squatting Domain Names
- Authors: Takashi Koide, Naoki Fukushi, Hiroki Nakano, Daiki Chiba,
- Abstract summary: Domain squatting is a technique used by attackers to create domain names for phishing sites.
We propose a system called PhishReplicant that detects generated squatting domains (GSDs) by focusing on the linguistic similarity of domain names.
- Score: 2.3999111269325266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain squatting is a technique used by attackers to create domain names for phishing sites. In recent phishing attempts, we have observed many domain names that use multiple techniques to evade existing methods for domain squatting. These domain names, which we call generated squatting domains (GSDs), are quite different in appearance from legitimate domain names and do not contain brand names, making them difficult to associate with phishing. In this paper, we propose a system called PhishReplicant that detects GSDs by focusing on the linguistic similarity of domain names. We analyzed newly registered and observed domain names extracted from certificate transparency logs, passive DNS, and DNS zone files. We detected 3,498 domain names acquired by attackers in a four-week experiment, of which 2,821 were used for phishing sites within a month of detection. We also confirmed that our proposed system outperformed existing systems in both detection accuracy and number of domain names detected. As an in-depth analysis, we examined 205k GSDs collected over 150 days and found that phishing using GSDs was distributed globally. However, attackers intensively targeted brands in specific regions and industries. By analyzing GSDs in real time, we can block phishing sites before or immediately after they appear.
Related papers
- Registration, Detection, and Deregistration: Analyzing DNS Abuse for Phishing Attacks [2.160481692907504]
Phishing continues to pose a significant cybersecurity threat.
It is essential to address this fundamental challenge at the root, particularly in phishing domains.
Domain registration presents a crucial intervention point, as domains serve as the primary gateway between users and websites.
arXiv Detail & Related papers (2025-02-13T18:02:48Z) - DomainDynamics: Lifecycle-Aware Risk Timeline Construction for Domain Names [2.6217304977339473]
DomainDynamics is a novel system designed to predict domain name risks by considering their lifecycle stages.
In an experiment involving over 85,000 actual malicious domains from malware and phishing incidents, DomainDynamics achieved an 82.58% detection rate with a low false positive rate of 0.41%.
arXiv Detail & Related papers (2024-10-02T23:33:13Z) - DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection [2.6217304977339473]
Domain squatting poses a significant threat to Internet security, with attackers employing increasingly sophisticated techniques.
This study introduces DomainLynx, an innovative compound AI system leveraging Large Language Models (LLMs) for enhanced domain squatting detection.
In a month-long real-world test, it detected 34,359 squatting domains from 2.09 million new domains, outperforming baseline methods by 2.5 times.
arXiv Detail & Related papers (2024-10-02T23:32:09Z) - Don't Get Hijacked: Prevalence, Mitigation, and Impact of Non-Secure DNS Dynamic Updates [1.135267457536642]
DNS dynamic updates represent an inherently vulnerable mechanism.
Non-secure DNS updates leave domains susceptible to a novel form of attack termed zone poisoning.
We undertook a comprehensive campaign involving the notification of Computer Security Incident Response Teams.
arXiv Detail & Related papers (2024-05-30T09:23:53Z) - Domain Generalization via Causal Adjustment for Cross-Domain Sentiment
Analysis [59.73582306457387]
We focus on the problem of domain generalization for cross-domain sentiment analysis.
We propose a backdoor adjustment-based causal model to disentangle the domain-specific and domain-invariant representations.
A series of experiments show the great performance and robustness of our model.
arXiv Detail & Related papers (2024-02-22T13:26:56Z) - Open SESAME: Fighting Botnets with Seed Reconstructions of Domain
Generation Algorithms [0.0]
Bots can generate pseudorandom domain names using Domain Generation Algorithms (DGAs)
A cyber criminal can register such domains to establish periodically changing rendezvous points with the bots.
We introduce SESAME, a system that combines the two above-mentioned approaches and contains a module for automatic Seed Reconstruction.
arXiv Detail & Related papers (2023-01-12T14:25:31Z) - Learning to Share by Masking the Non-shared for Multi-domain Sentiment
Classification [24.153584996936424]
We propose a network which explicitly masks domain-related words from texts, learns domain-invariant sentiment features from these domain-agnostic texts, and uses those masked words to form domain-aware sentence representations.
Empirical experiments on a well-adopted multiple domain sentiment classification dataset demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2021-04-17T08:15:29Z) - Open Domain Generalization with Domain-Augmented Meta-Learning [83.59952915761141]
We study a novel and practical problem of Open Domain Generalization (OpenDG)
We propose a Domain-Augmented Meta-Learning framework to learn open-domain generalizable representations.
Experiment results on various multi-domain datasets demonstrate that the proposed Domain-Augmented Meta-Learning (DAML) outperforms prior methods for unseen domain recognition.
arXiv Detail & Related papers (2021-04-08T09:12:24Z) - Batch Normalization Embeddings for Deep Domain Generalization [50.51405390150066]
Domain generalization aims at training machine learning models to perform robustly across different and unseen domains.
We show a significant increase in classification accuracy over current state-of-the-art techniques on popular domain generalization benchmarks.
arXiv Detail & Related papers (2020-11-25T12:02:57Z) - CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web
to Special Domain Search [89.48123965553098]
This paper presents a search system to alleviate the special domain adaption problem.
The system utilizes the domain-adaptive pretraining and few-shot learning technologies to help neural rankers mitigate the domain discrepancy.
Our system performs the best among the non-manual runs in Round 2 of the TREC-COVID task.
arXiv Detail & Related papers (2020-11-03T09:10:48Z) - Domain Agnostic Learning for Unbiased Authentication [47.85174796247398]
We propose a domain-agnostic method that eliminates domain-difference without domain labels.
latent domains are discovered by learning the heterogeneous predictive relationships between inputs and outputs.
We extend our method to a meta-learning framework to pursue more thorough domain-difference elimination.
arXiv Detail & Related papers (2020-10-11T14:05:16Z) - Cross-domain Self-supervised Learning for Domain Adaptation with Few
Source Labels [78.95901454696158]
We propose a novel Cross-Domain Self-supervised learning approach for domain adaptation.
Our method significantly boosts performance of target accuracy in the new target domain with few source labels.
arXiv Detail & Related papers (2020-03-18T15:11:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.