DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection
- URL: http://arxiv.org/abs/2410.02095v2
- Date: Fri, 18 Oct 2024 03:01:03 GMT
- Title: DomainLynx: Leveraging Large Language Models for Enhanced Domain Squatting Detection
- Authors: Daiki Chiba, Hiroki Nakano, Takashi Koide,
- Abstract summary: Domain squatting poses a significant threat to Internet security, with attackers employing increasingly sophisticated techniques.
This study introduces DomainLynx, an innovative compound AI system leveraging Large Language Models (LLMs) for enhanced domain squatting detection.
In a month-long real-world test, it detected 34,359 squatting domains from 2.09 million new domains, outperforming baseline methods by 2.5 times.
- Score: 2.6217304977339473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain squatting poses a significant threat to Internet security, with attackers employing increasingly sophisticated techniques. This study introduces DomainLynx, an innovative compound AI system leveraging Large Language Models (LLMs) for enhanced domain squatting detection. Unlike existing methods focusing on predefined patterns for top-ranked domains, DomainLynx excels in identifying novel squatting techniques and protecting less prominent brands. The system's architecture integrates advanced data processing, intelligent domain pairing, and LLM-powered threat assessment. Crucially, DomainLynx incorporates specialized components that mitigate LLM hallucinations, ensuring reliable and context-aware detection. This approach enables efficient analysis of vast security data from diverse sources, including Certificate Transparency logs, Passive DNS records, and zone files. Evaluated on a curated dataset of 1,649 squatting domains, DomainLynx achieved 94.7\% accuracy using Llama-3-70B. In a month-long real-world test, it detected 34,359 squatting domains from 2.09 million new domains, outperforming baseline methods by 2.5 times. This research advances Internet security by providing a versatile, accurate, and adaptable tool for combating evolving domain squatting threats. DomainLynx's approach paves the way for more robust, AI-driven cybersecurity solutions, enhancing protection for a broader range of online entities and contributing to a safer digital ecosystem.
Related papers
- Training Large Language Models for Advanced Typosquatting Detection [0.0]
Typosquatting is a cyber threat that exploits human error in typing URLs to deceive users, distribute malware, and conduct phishing attacks.
This study introduces a novel approach leveraging large language models (LLMs) to enhance typosquatting detection.
Experimental results indicate that the Phi-4 14B model outperformed other tested models when properly fine tuned achieving a 98% accuracy rate with only a few thousand training samples.
arXiv Detail & Related papers (2025-03-28T13:16:27Z) - Shh, don't say that! Domain Certification in LLMs [124.61851324874627]
Large language models (LLMs) are often deployed to perform constrained tasks, with narrow domains.
We introduce domain certification; a guarantee that accurately characterizes the out-of-domain behavior of language models.
We then propose a simple yet effective approach, which we call VALID that provides adversarial bounds as a certificate.
arXiv Detail & Related papers (2025-02-26T17:13:19Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - Don't Get Hijacked: Prevalence, Mitigation, and Impact of Non-Secure DNS Dynamic Updates [1.135267457536642]
DNS dynamic updates represent an inherently vulnerable mechanism.
Non-secure DNS updates leave domains susceptible to a novel form of attack termed zone poisoning.
We undertook a comprehensive campaign involving the notification of Computer Security Incident Response Teams.
arXiv Detail & Related papers (2024-05-30T09:23:53Z) - SecureReg: Combining NLP and MLP for Enhanced Detection of Malicious Domain Name Registrations [0.0]
This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process.
The proposed system analyzes semantic and numerical attributes by leveraging a novel combination of Natural Language Processing (NLP) techniques.
With an F1 score of 84.86% and an accuracy of 84.95% on the SecureReg dataset, it effectively detects malicious domain registrations.
arXiv Detail & Related papers (2024-01-06T11:43:57Z) - Revisiting the Domain Shift and Sample Uncertainty in Multi-source
Active Domain Transfer [69.82229895838577]
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.
This setting neglects the more practical scenario where training data are collected from multiple sources.
This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains.
arXiv Detail & Related papers (2023-11-21T13:12:21Z) - PhishReplicant: A Language Model-based Approach to Detect Generated Squatting Domain Names [2.3999111269325266]
Domain squatting is a technique used by attackers to create domain names for phishing sites.
We propose a system called PhishReplicant that detects generated squatting domains (GSDs) by focusing on the linguistic similarity of domain names.
arXiv Detail & Related papers (2023-10-18T07:41:41Z) - ReMask: A Robust Information-Masking Approach for Domain Counterfactual
Generation [16.275230631985824]
Domain counterfactual generation aims to transform a text from the source domain to a given target domain.
We employ a three-step domain obfuscation approach that involves frequency and attention norm-based masking, to mask domain-specific cues, and unmasking to regain the domain generic context.
Our model outperforms the state-of-the-art by achieving 1.4% average accuracy improvement in the adversarial domain adaptation setting.
arXiv Detail & Related papers (2023-05-04T14:19:02Z) - Adversarial Bi-Regressor Network for Domain Adaptive Regression [52.5168835502987]
It is essential to learn a cross-domain regressor to mitigate the domain shift.
This paper proposes a novel method Adversarial Bi-Regressor Network (ABRNet) to seek more effective cross-domain regression model.
arXiv Detail & Related papers (2022-09-20T18:38:28Z) - Domain Adaptation from Scratch [24.612696638386623]
We present a new learning setup, domain adaptation from scratch'', which we believe to be crucial for extending the reach of NLP to sensitive domains.
In this setup, we aim to efficiently annotate data from a set of source domains such that the trained model performs well on a sensitive target domain.
Our study compares several approaches for this challenging setup, ranging from data selection and domain adaptation algorithms to active learning paradigms.
arXiv Detail & Related papers (2022-09-02T05:55:09Z) - A cross-domain recommender system using deep coupled autoencoders [77.86290991564829]
Two novel coupled autoencoder-based deep learning methods are proposed for cross-domain recommendation.
The first method aims to simultaneously learn a pair of autoencoders in order to reveal the intrinsic representations of the items in the source and target domains.
The second method is derived based on a new joint regularized optimization problem, which employs two autoencoders to generate in a deep and non-linear manner the user and item-latent factors.
arXiv Detail & Related papers (2021-12-08T15:14:26Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - AFAN: Augmented Feature Alignment Network for Cross-Domain Object
Detection [90.18752912204778]
Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications.
We propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training.
Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations.
arXiv Detail & Related papers (2021-06-10T05:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.