PhishZip: A New Compression-based Algorithm for Detecting Phishing
Websites
- URL: http://arxiv.org/abs/2007.11955v1
- Date: Wed, 22 Jul 2020 00:32:06 GMT
- Title: PhishZip: A New Compression-based Algorithm for Detecting Phishing
Websites
- Authors: Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha
- Abstract summary: PhishZip is a novel phishing detection approach using a compression algorithm to perform website classification.
We also propose the use of compression ratio as a novel machine learning feature which significantly improves machine learning based phishing detection.
- Score: 12.468922937529966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phishing has grown significantly in the past few years and is predicted to
further increase in the future. The dynamics of phishing introduce challenges
in implementing a robust phishing detection system and selecting features which
can represent phishing despite the change of attack. In this paper, we propose
PhishZip which is a novel phishing detection approach using a compression
algorithm to perform website classification and demonstrate a systematic way to
construct the word dictionaries for the compression models using word
occurrence likelihood analysis. PhishZip outperforms the use of best-performing
HTML-based features in past studies, with a true positive rate of 80.04%. We
also propose the use of compression ratio as a novel machine learning feature
which significantly improves machine learning based phishing detection over
previous studies. Using compression ratios as additional features, the true
positive rate significantly improves by 30.3% (from 51.47% to 81.77%), while
the accuracy increases by 11.84% (from 71.20% to 83.04%).
Related papers
- PhishGuard: A Multi-Layered Ensemble Model for Optimal Phishing Website Detection [0.0]
Phishing attacks are a growing cybersecurity threat, leveraging deceptive techniques to steal sensitive information through malicious websites.
This paper introduces PhishGuard, an optimal custom ensemble model designed to improve phishing site detection.
The model combines multiple machine learning classifiers, including Random Forest, Gradient Boosting, CatBoost, and XGBoost, to enhance detection accuracy.
arXiv Detail & Related papers (2024-09-29T23:15:57Z) - Next Generation of Phishing Attacks using AI powered Browsers [0.0]
The model had an accuracy of 98.32%, precision of 98.62%, recall of 97.86%, and an F1-score of 98.24%.
The zero-day phishing attack detection testing over a 15-day period revealed the model's capability to identify previously unseen threats.
The model had successfully detected phishing URLs that evaded detection by Google safe browsing.
arXiv Detail & Related papers (2024-06-18T12:24:36Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool [12.468922937529966]
We propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD)
This measure computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction.
We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages.
arXiv Detail & Related papers (2022-07-13T20:44:03Z) - Estimating the Resize Parameter in End-to-end Learned Image Compression [50.20567320015102]
We describe a search-free resizing framework that can further improve the rate-distortion tradeoff of recent learned image compression models.
Our results show that our new resizing parameter estimation framework can provide Bjontegaard-Delta rate (BD-rate) improvement of about 10% against leading perceptual quality engines.
arXiv Detail & Related papers (2022-04-26T01:35:02Z) - Towards Text-based Phishing Detection [0.0]
This paper reports on an experiment into text-based phishing detection using readily available resources and without the use of semantics.
The results obtained in recognizing phishing emails are considerably better than the previously reported work; but the rate of text falsely identified as phishing is slightly worse.
arXiv Detail & Related papers (2021-11-02T15:37:33Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - High Accuracy Phishing Detection Based on Convolutional Neural Networks [0.0]
We present a deep learning-based approach to enable high accuracy detection of phishing sites.
The proposed approach utilizes convolutional neural networks (CNN) for high accuracy classification.
We evaluate the models using a dataset obtained from 6,157 genuine and 4,898 phishing websites.
arXiv Detail & Related papers (2020-04-08T12:20:14Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.