PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool
- URL: http://arxiv.org/abs/2207.10801v1
- Date: Wed, 13 Jul 2022 20:44:03 GMT
- Title: PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool
- Authors: Rizka Purwanto, Arindam Pal, Alan Blair, Sanjay Jha
- Abstract summary: We propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD)
This measure computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction.
We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages.
- Score: 12.468922937529966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a feature-free method for detecting phishing
websites using the Normalized Compression Distance (NCD), a parameter-free
similarity measure which computes the similarity of two websites by compressing
them, thus eliminating the need to perform any feature extraction. It also
removes any dependence on a specific set of website features. This method
examines the HTML of webpages and computes their similarity with known phishing
websites, in order to classify them. We use the Furthest Point First algorithm
to perform phishing prototype extractions, in order to select instances that
are representative of a cluster of phishing webpages. We also introduce the use
of an incremental learning algorithm as a framework for continuous and adaptive
detection without extracting new features when concept drift occurs. On a large
dataset, our proposed method significantly outperforms previous methods in
detecting phishing websites, with an AUC score of 98.68%, a high true positive
rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of
0.58%. Our approach uses prototypes, eliminating the need to retain long term
data in the future, and is feasible to deploy in real systems with a processing
time of roughly 0.3 seconds.
Related papers
- Can Features for Phishing URL Detection Be Trusted Across Diverse Datasets? A Case Study with Explainable AI [0.0]
Phishing has been a prevalent cyber threat that manipulates users into revealing sensitive private information through deceptive tactics.
proactively detection of phishing URLs (or websites) has been established as an widely-accepted defense approach.
We analyze two publicly available phishing URL datasets, where each dataset has its own set of unique and overlapping features related to URL string and website contents.
arXiv Detail & Related papers (2024-11-14T21:07:52Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - PhishNet: A Phishing Website Detection Tool using XGBoost [1.777434178384403]
PhisNet is a cutting-edge web application designed to detect phishing websites using advanced machine learning.
It aims to help individuals and organizations identify and prevent phishing attacks through a robust AI framework.
arXiv Detail & Related papers (2024-06-29T21:31:13Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - Position Paper: Think Globally, React Locally -- Bringing Real-time Reference-based Website Phishing Detection on macOS [0.4962561299282114]
The recent surge in phishing attacks keeps undermining the effectiveness of the traditional anti-phishing blacklist approaches.
On-device anti-phishing solutions are gaining popularity as they offer faster phishing detection locally.
We propose a phishing detection solution that uses a combination of computer vision and on-device machine learning models to analyze websites in real time.
arXiv Detail & Related papers (2024-05-28T14:46:03Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - A Sophisticated Framework for the Accurate Detection of Phishing Websites [0.0]
Phishing is an increasingly sophisticated form of cyberattack that is inflicting huge financial damage to corporations throughout the globe.
This paper proposes a comprehensive methodology for detecting phishing websites.
A combination of feature selection, greedy algorithm, cross-validation, and deep learning methods have been utilized to construct a sophisticated stacking ensemble.
arXiv Detail & Related papers (2024-03-13T14:26:25Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - Phishing URL Detection Through Top-level Domain Analysis: A Descriptive
Approach [3.494620587853103]
This study aims to develop a machine-learning model to detect fraudulent URLs which can be used within the Splunk platform.
Inspired from similar approaches in the literature, we trained the SVM and Random Forests algorithms using malicious and benign datasets.
We evaluated the algorithms' performance with precision and recall, reaching up to 85% precision and 87% recall in the case of Random Forests.
arXiv Detail & Related papers (2020-05-13T21:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.