A Study of Effectiveness of Brand Domain Identification Features for Phishing Detection in 2025
- URL: http://arxiv.org/abs/2503.06487v1
- Date: Sun, 09 Mar 2025 07:14:04 GMT
- Title: A Study of Effectiveness of Brand Domain Identification Features for Phishing Detection in 2025
- Authors: Rina Mishra, Gaurav Varshney,
- Abstract summary: Brand Domain Identification serves as a crucial step in many phishing detection approaches.<n>This study systematically evaluates the effectiveness of features employed over the past decade for BDI.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phishing websites continue to pose a significant security challenge, making the development of robust detection mechanisms essential. Brand Domain Identification (BDI) serves as a crucial step in many phishing detection approaches. This study systematically evaluates the effectiveness of features employed over the past decade for BDI, focusing on their weighted importance in phishing detection as of 2025. The primary objective is to determine whether the identified brand domain matches the claimed domain, utilizing popular features for phishing detection. To validate feature importance and evaluate performance, we conducted two experiments on a dataset comprising 4,667 legitimate sites and 4,561 phishing sites. In Experiment 1, we used the Weka tool to identify optimized and important feature sets out of 5: CN Information(CN), Logo Domain(LD),Form Action Domain(FAD),Most Common Link in Domain(MCLD) and Cookie Domain through its 4 Attribute Ranking Evaluator. The results revealed that none of the features were redundant, and Random Forest emerged as the best classifier, achieving an impressive accuracy of 99.7\% with an average response time of 0.08 seconds. In Experiment 2, we trained five machine learning models, including Random Forest, Decision Tree, Support Vector Machine, Multilayer Perceptron, and XGBoost to assess the performance of individual BDI features and their combinations. The results demonstrated an accuracy of 99.8\%, achieved with feature combinations of only three features: Most Common Link Domain, Logo Domain, Form Action and Most Common Link Domain,CN Info,Logo Domain using Random Forest as the best classifier. This study underscores the importance of leveraging key domain features for efficient phishing detection and paves the way for the development of real-time, scalable detection systems.
Related papers
- Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification [4.281580125566764]
This study evaluates the effectiveness of different feature extraction techniques and classification algorithms in detecting spam messages within SMS data.
We found that TF-IDF, when paired with Naive Bayes, Support Vector Machines, or Deep Neural Networks, provides the most reliable performance.
arXiv Detail & Related papers (2025-02-16T06:39:36Z) - What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes.
Various scoring functions are proposed to distinguish it from in-distribution (ID) data.
We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z) - Multimodal Large Language Models for Phishing Webpage Detection and Identification [29.291474807301594]
We study the efficacy of large language models (LLMs) in detecting phishing webpages.
Our system achieves a high detection rate at high precision.
It also provides interpretable evidence for the decisions.
arXiv Detail & Related papers (2024-08-12T06:36:08Z) - Domain penalisation for improved Out-of-Distribution Generalisation [1.979158763744267]
Domain generalisation (DG) aims to ensure robust performance across diverse and unseen target domains.
We propose a framework for the task of object detection, where the data is assumed to be sampled from multiple source domains.
By prioritising the domains that needs more attention, our approach effectively balances the training process.
arXiv Detail & Related papers (2024-08-03T11:06:47Z) - DARE: Towards Robust Text Explanations in Biomedical and Healthcare
Applications [54.93807822347193]
We show how to adapt attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility.
Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE.
Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
arXiv Detail & Related papers (2023-07-05T08:11:40Z) - Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features [23.266183020469065]
We propose a novel framework that disentangles foreground and background features from ID training samples via a dense prediction approach.
It is a generic framework that allows for a seamless combination with various existing OOD detection methods.
arXiv Detail & Related papers (2023-03-15T16:12:14Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency [90.71745178767203]
Deep learning-based 3D object detection has achieved unprecedented success with the advent of large-scale autonomous driving datasets.
Existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.
We study a more realistic setting, unsupervised 3D domain adaptive detection, which only utilizes source domain annotations.
arXiv Detail & Related papers (2021-07-23T17:19:23Z) - Unsupervised Out-of-Domain Detection via Pre-trained Transformers [56.689635664358256]
Out-of-domain inputs can lead to unpredictable outputs and sometimes catastrophic safety issues.
Our work tackles the problem of detecting out-of-domain samples with only unsupervised in-domain data.
Two domain-specific fine-tuning approaches are further proposed to boost detection accuracy.
arXiv Detail & Related papers (2021-06-02T05:21:25Z) - Disentanglement-based Cross-Domain Feature Augmentation for Effective
Unsupervised Domain Adaptive Person Re-identification [87.72851934197936]
Unsupervised domain adaptive (UDA) person re-identification (ReID) aims to transfer the knowledge from the labeled source domain to the unlabeled target domain for person matching.
One challenge is how to generate target domain samples with reliable labels for training.
We propose a Disentanglement-based Cross-Domain Feature Augmentation strategy.
arXiv Detail & Related papers (2021-03-25T15:28:41Z) - Phishing URL Detection Through Top-level Domain Analysis: A Descriptive
Approach [3.494620587853103]
This study aims to develop a machine-learning model to detect fraudulent URLs which can be used within the Splunk platform.
Inspired from similar approaches in the literature, we trained the SVM and Random Forests algorithms using malicious and benign datasets.
We evaluated the algorithms' performance with precision and recall, reaching up to 85% precision and 87% recall in the case of Random Forests.
arXiv Detail & Related papers (2020-05-13T21:41:29Z) - High Accuracy Phishing Detection Based on Convolutional Neural Networks [0.0]
We present a deep learning-based approach to enable high accuracy detection of phishing sites.
The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification.
We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites.
arXiv Detail & Related papers (2020-04-08T12:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.