Enhance the machine learning algorithm performance in phishing detection with keyword features
- URL: http://arxiv.org/abs/2508.09765v1
- Date: Tue, 12 Aug 2025 14:16:11 GMT
- Title: Enhance the machine learning algorithm performance in phishing detection with keyword features
- Authors: Zijiang Yang,
- Abstract summary: In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information.<n>Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones.<n>We propose a novel method to incorporate the keyword features with the traditional features.
- Score: 1.7487745673871375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.
Related papers
- RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy [65.80757820884476]
We expose a critical yet underexplored vulnerability in the deployment of unlearning systems.
We present a threat model where an attacker can degrade model accuracy by submitting adversarial unlearning requests for data not present in the training set.
We evaluate various verification mechanisms to detect the legitimacy of unlearning requests and reveal the challenges in verification.
arXiv Detail & Related papers (2024-10-12T16:47:04Z) - A Sophisticated Framework for the Accurate Detection of Phishing Websites [0.0]
Phishing is an increasingly sophisticated form of cyberattack that is inflicting huge financial damage to corporations throughout the globe.
This paper proposes a comprehensive methodology for detecting phishing websites.
A combination of feature selection, greedy algorithm, cross-validation, and deep learning methods have been utilized to construct a sophisticated stacking ensemble.
arXiv Detail & Related papers (2024-03-13T14:26:25Z) - Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection [23.962076093344166]
We propose an innovative deep learning-based approach for phishing attack localization.
Our method can not only predict the vulnerability of the email data but also automatically learn and figure out the most important and phishing-relevant information.
arXiv Detail & Related papers (2024-02-27T00:03:07Z) - An Adversarial Attack Analysis on Malicious Advertisement URL Detection
Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks.
Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data.
In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z) - Phishing Attacks Detection -- A Machine Learning-Based Approach [0.6445605125467573]
Phishing attacks are one of the most common social engineering attacks targeting users emails to fraudulently steal confidential and sensitive information.
In this paper, we proposed a phishing attack detection technique based on machine learning.
We collected and analyzed more than 4000 phishing emails targeting the email service of the University of North Dakota.
arXiv Detail & Related papers (2022-01-26T05:08:27Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Information Theoretic Meta Learning with Gaussian Processes [74.54485310507336]
We formulate meta learning using information theoretic concepts; namely, mutual information and the information bottleneck.
By making use of variational approximations to the mutual information, we derive a general and tractable framework for meta learning.
arXiv Detail & Related papers (2020-09-07T16:47:30Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - Cyber Attack Detection thanks to Machine Learning Algorithms [0.0]
This paper explores Machine Learning as a viable solution by examining its capabilities to classify malicious traffic in a network.
Our approach analyzes five different machine learning algorithms against NetFlow dataset containing common botnets.
The Random Forest succeeds in detecting more than 95% of the botnets in 8 out of 13 scenarios and more than 55% in the most difficult datasets.
arXiv Detail & Related papers (2020-01-17T13:52:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.