Related papers: Towards Web Phishing Detection Limitations and Mitigation

Towards Web Phishing Detection Limitations and Mitigation

URL: http://arxiv.org/abs/2204.00985v1
Date: Sun, 3 Apr 2022 04:26:04 GMT
Title: Towards Web Phishing Detection Limitations and Mitigation
Authors: Alsharif Abuadbba, Shuo Wang, Mahathir Almashor, Muhammed Ejaz Ahmed, Raj Gaire, Seyit Camtepe, Surya Nepal
Abstract summary: We show how phishing sites bypass Machine Learning-based detection. Experiments with 100K phishing/benign sites show promising accuracy (98.8%) We propose Anti-SubtlePhish, a more resilient model based on logistic regression.
Score: 21.738240693843295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Web phishing remains a serious cyber threat responsible for most data breaches. Machine Learning (ML)-based anti-phishing detectors are seen as an effective countermeasure, and are increasingly adopted by web-browsers and software products. However, with an average of 10K phishing links reported per hour to platforms such as PhishTank and VirusTotal (VT), the deficiencies of such ML-based solutions are laid bare. We first explore how phishing sites bypass ML-based detection with a deep dive into 13K phishing pages targeting major brands such as Facebook. Results show successful evasion is caused by: (1) use of benign services to obscure phishing URLs; (2) high similarity between the HTML structures of phishing and benign pages; (3) hiding the ultimate phishing content within Javascript and running such scripts only on the client; (4) looking beyond typical credentials and credit cards for new content such as IDs and documents; (5) hiding phishing content until after human interaction. We attribute the root cause to the dependency of ML-based models on the vertical feature space (webpage content). These solutions rely only on what phishers present within the page itself. Thus, we propose Anti-SubtlePhish, a more resilient model based on logistic regression. The key augmentation is the inclusion of a horizontal feature space, which examines correlation variables between the final render of suspicious pages against what trusted services have recorded (e.g., PageRank). To defeat (1) and (2), we correlate information between WHOIS, PageRank, and page analytics. To combat (3), (4) and (5), we correlate features after rendering the page. Experiments with 100K phishing/benign sites show promising accuracy (98.8%). We also obtained 100% accuracy against 0-day phishing pages that were manually crafted, comparing well to the 0% recorded by VT vendors over the first four days.

Related papers

CIC-Trap4Phish: A Unified Multi-Format Dataset for Phishing and Quishing Attachment Detection [35.21543593148398]
Phishing attacks represent one of the primary attack methods used by cyber attackers.<n> CIC-Trap4Phish dataset contains both malicious and benign samples across five categories commonly used in phishing campaigns.
arXiv Detail & Related papers (2026-02-09T18:57:00Z)
Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models [71.44858461725893]
Given a model fine-tuned by an untrusted third party, determining whether the model has been injected with a backdoor is a critical and challenging problem.<n>Existing detection methods usually rely on prior knowledge of training dataset, backdoor triggers and targets.<n>We introduce Assimilation Matters in DETection (AMDET), a novel model-level detection framework that operates without any such prior knowledge.
arXiv Detail & Related papers (2025-11-29T06:20:00Z)
Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks [43.26879314353337]
This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system.<n>By changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases.<n>For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate.
arXiv Detail & Related papers (2025-10-02T05:04:44Z)
Characterizing Phishing Pages by JavaScript Capabilities [77.64740286751834]
This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit.<n>For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs.<n>We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters.
arXiv Detail & Related papers (2025-09-16T15:39:23Z)
PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation [13.177607247367211]
We propose PhishIntentionLLM, a framework that uncovers phishing intentions from website screenshots.<n>Our framework identifies four key phishing objectives: Credential Theft, Financial Fraud, Malware Distribution, and Personal Information Harvesting.<n>We generate a larger dataset of 9K samples for large-scale phishing intention profiling across sectors.
arXiv Detail & Related papers (2025-07-21T09:20:43Z)
EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture. It is on par with existing deep learning techniques but has better explainability. It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z)
Towards Invisible Backdoor Attack on Text-to-Image Diffusion Model [70.03122709795122]
Backdoor attacks targeting text-to-image diffusion models have advanced rapidly. Current backdoor samples often exhibit two key abnormalities compared to benign samples. We propose a novel Invisible Backdoor Attack (IBA) to enhance the stealthiness of backdoor samples.
arXiv Detail & Related papers (2025-03-22T10:41:46Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks [0.8050163120218178]
Phishing attacks attempt to deceive users into stealing sensitive information. Current phishing webpage detection solutions are vulnerable to adversarial attacks. We develop a tool that generates adversarial phishing webpages by embedding diverse phishing features into legitimate webpages.
arXiv Detail & Related papers (2024-07-29T18:21:34Z)
Position Paper: Think Globally, React Locally -- Bringing Real-time Reference-based Website Phishing Detection on macOS [0.4962561299282114]
The recent surge in phishing attacks keeps undermining the effectiveness of the traditional anti-phishing blacklist approaches. On-device anti-phishing solutions are gaining popularity as they offer faster phishing detection locally. We propose a phishing detection solution that uses a combination of computer vision and on-device machine learning models to analyze websites in real time.
arXiv Detail & Related papers (2024-05-28T14:46:03Z)
"Are Adversarial Phishing Webpages a Threat in Reality?" Understanding the Users' Perception of Adversarial Webpages [21.474375992224633]
Machine learning based phishing website detectors (ML-PWD) are a critical part of today's anti-phishing solutions in operation. We show that adversarial phishing is a threat to both users and ML-PWD. We also show that users' self-reported frequency of visiting a brand's website has a statistically negative correlation with their phishing detection accuracy.
arXiv Detail & Related papers (2024-04-03T16:10:17Z)
Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning [57.50274256088251]
We show that parameter-efficient fine-tuning (PEFT) is more susceptible to weight-poisoning backdoor attacks. We develop a Poisoned Sample Identification Module (PSIM) leveraging PEFT, which identifies poisoned samples through confidence. We conduct experiments on text classification tasks, five fine-tuning strategies, and three weight-poisoning backdoor attack methods.
arXiv Detail & Related papers (2024-02-19T14:22:54Z)
"Do Users fall for Real Adversarial Phishing?" Investigating the Human response to Evasive Webpages [7.779975012737389]
State-of-the-art solutions entail the application of machine learning to detect phishing websites by checking if they visually resemble webpages of well-known brands. Some security companies began to deploy them also in their phishing detection systems (PDS) In this paper, we scrutinize whether 'genuine phishing websites' that evade 'commercial ML-based PDS' represent a problem "in reality"
arXiv Detail & Related papers (2023-11-28T00:08:48Z)
DRSM: De-Randomized Smoothing on Malware Classifier Providing Certified Robustness [58.23214712926585]
We develop a certified defense, DRSM (De-Randomized Smoothed MalConv), by redesigning the de-randomized smoothing technique for the domain of malware detection. Specifically, we propose a window ablation scheme to provably limit the impact of adversarial bytes while maximally preserving local structures of the executables. We are the first to offer certified robustness in the realm of static detection of malware executables.
arXiv Detail & Related papers (2023-03-20T17:25:22Z)
An Embarrassingly Simple Backdoor Attack on Self-supervised Learning [52.28670953101126]
Self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels. We study the inherent vulnerability of SSL to backdoor attacks.
arXiv Detail & Related papers (2022-10-13T20:39:21Z)
Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models. Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z)
PhishMatch: A Layered Approach for Effective Detection of Phishing URLs [8.658596218544774]
We present a layered anti-phishing defense, PhishMatch, which is robust, accurate, inexpensive, and client-side. A prototype plugin of PhishMatch, developed for the Chrome browser, was found to be fast and lightweight.
arXiv Detail & Related papers (2021-12-04T03:21:29Z)
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)
Being Single Has Benefits. Instance Poisoning to Deceive Malware Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier. As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger. We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z)
Phishing and Spear Phishing: examples in Cyber Espionage and techniques to protect against them [91.3755431537592]
Phishing attacks have become the most used technique in the online scams, initiating more than 91% of cyberattacks, from 2012 onwards. This study reviews how Phishing and Spear Phishing attacks are carried out by the phishers, through 5 steps which magnify the outcome.
arXiv Detail & Related papers (2020-05-31T18:10:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.