Towards Web Phishing Detection Limitations and Mitigation
- URL: http://arxiv.org/abs/2204.00985v1
- Date: Sun, 3 Apr 2022 04:26:04 GMT
- Title: Towards Web Phishing Detection Limitations and Mitigation
- Authors: Alsharif Abuadbba, Shuo Wang, Mahathir Almashor, Muhammed Ejaz Ahmed,
Raj Gaire, Seyit Camtepe, Surya Nepal
- Abstract summary: We show how phishing sites bypass Machine Learning-based detection.
Experiments with 100K phishing/benign sites show promising accuracy (98.8%)
We propose Anti-SubtlePhish, a more resilient model based on logistic regression.
- Score: 21.738240693843295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Web phishing remains a serious cyber threat responsible for most data
breaches. Machine Learning (ML)-based anti-phishing detectors are seen as an
effective countermeasure, and are increasingly adopted by web-browsers and
software products. However, with an average of 10K phishing links reported per
hour to platforms such as PhishTank and VirusTotal (VT), the deficiencies of
such ML-based solutions are laid bare. We first explore how phishing sites
bypass ML-based detection with a deep dive into 13K phishing pages targeting
major brands such as Facebook. Results show successful evasion is caused by:
(1) use of benign services to obscure phishing URLs; (2) high similarity
between the HTML structures of phishing and benign pages; (3) hiding the
ultimate phishing content within Javascript and running such scripts only on
the client; (4) looking beyond typical credentials and credit cards for new
content such as IDs and documents; (5) hiding phishing content until after
human interaction. We attribute the root cause to the dependency of ML-based
models on the vertical feature space (webpage content). These solutions rely
only on what phishers present within the page itself. Thus, we propose
Anti-SubtlePhish, a more resilient model based on logistic regression. The key
augmentation is the inclusion of a horizontal feature space, which examines
correlation variables between the final render of suspicious pages against what
trusted services have recorded (e.g., PageRank). To defeat (1) and (2), we
correlate information between WHOIS, PageRank, and page analytics. To combat
(3), (4) and (5), we correlate features after rendering the page. Experiments
with 100K phishing/benign sites show promising accuracy (98.8%). We also
obtained 100% accuracy against 0-day phishing pages that were manually crafted,
comparing well to the 0% recorded by VT vendors over the first four days.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks [0.8050163120218178]
Phishing attacks attempt to deceive users into stealing sensitive information.
Current phishing webpage detection solutions are vulnerable to adversarial attacks.
We develop a tool that generates adversarial phishing webpages by embedding diverse phishing features into legitimate webpages.
arXiv Detail & Related papers (2024-07-29T18:21:34Z) - Position Paper: Think Globally, React Locally -- Bringing Real-time Reference-based Website Phishing Detection on macOS [0.4962561299282114]
The recent surge in phishing attacks keeps undermining the effectiveness of the traditional anti-phishing blacklist approaches.
On-device anti-phishing solutions are gaining popularity as they offer faster phishing detection locally.
We propose a phishing detection solution that uses a combination of computer vision and on-device machine learning models to analyze websites in real time.
arXiv Detail & Related papers (2024-05-28T14:46:03Z) - "Are Adversarial Phishing Webpages a Threat in Reality?" Understanding the Users' Perception of Adversarial Webpages [21.474375992224633]
Machine learning based phishing website detectors (ML-PWD) are a critical part of today's anti-phishing solutions in operation.
We show that adversarial phishing is a threat to both users and ML-PWD.
We also show that users' self-reported frequency of visiting a brand's website has a statistically negative correlation with their phishing detection accuracy.
arXiv Detail & Related papers (2024-04-03T16:10:17Z) - Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning [57.50274256088251]
We show that parameter-efficient fine-tuning (PEFT) is more susceptible to weight-poisoning backdoor attacks.
We develop a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence.
We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods.
arXiv Detail & Related papers (2024-02-19T14:22:54Z) - "Do Users fall for Real Adversarial Phishing?" Investigating the Human response to Evasive Webpages [7.779975012737389]
State-of-the-art solutions entail the application of machine learning to detect phishing websites by checking if they visually resemble webpages of well-known brands.
Some security companies began to deploy them also in their phishing detection systems (PDS)
In this paper, we scrutinize whether 'genuine phishing websites' that evade 'commercial ML-based PDS' represent a problem "in reality"
arXiv Detail & Related papers (2023-11-28T00:08:48Z) - DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified
Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection.
Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables.
We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z) - An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels.
We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z) - PhishMatch: A Layered Approach for Effective Detection of Phishing URLs [8.658596218544774]
We present a layered anti-phishing defense, PhishMatch, which is robust, accurate, inexpensive, and client-side.
A prototype plugin of PhishMatch, developed for the Chrome browser, was found to be fast and lightweight.
arXiv Detail & Related papers (2021-12-04T03:21:29Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Phishing and Spear Phishing: examples in Cyber Espionage and techniques
to protect against them [91.3755431537592]
Phishing attacks have become the most used technique in the online scams, initiating more than 91% of cyberattacks, from 2012 onwards.
This study reviews how Phishing and Spear Phishing attacks are carried out by the phishers, through 5 steps which magnify the outcome.
arXiv Detail & Related papers (2020-05-31T18:10:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.