Phish-Blitz: Advancing Phishing Detection with Comprehensive Webpage Resource Collection and Visual Integrity Preservation
- URL: http://arxiv.org/abs/2509.08375v1
- Date: Wed, 10 Sep 2025 08:13:49 GMT
- Title: Phish-Blitz: Advancing Phishing Detection with Comprehensive Webpage Resource Collection and Visual Integrity Preservation
- Authors: Duddu Hriday, Aditya Kulkarni, Vivek Balachandran, Tamal Das,
- Abstract summary: We introduce Phish-Blitz, a tool that downloads phishing and legitimate webpages along with their associated resources, such as screenshots.<n>Unlike existing tools, Phish-Blitz captures live webpage screenshots and updates resource file paths to maintain the original visual integrity of the webpage.<n>We provide a dataset containing 8,809 legitimate and 5,000 phishing webpages, including all associated resources.
- Score: 0.03262230127283452
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Phishing attacks are increasingly prevalent, with adversaries creating deceptive webpages to steal sensitive information. Despite advancements in machine learning and deep learning for phishing detection, attackers constantly develop new tactics to bypass detection models. As a result, phishing webpages continue to reach users, particularly those unable to recognize phishing indicators. To improve detection accuracy, models must be trained on large datasets containing both phishing and legitimate webpages, including URLs, webpage content, screenshots, and logos. However, existing tools struggle to collect the required resources, especially given the short lifespan of phishing webpages, limiting dataset comprehensiveness. In response, we introduce Phish-Blitz, a tool that downloads phishing and legitimate webpages along with their associated resources, such as screenshots. Unlike existing tools, Phish-Blitz captures live webpage screenshots and updates resource file paths to maintain the original visual integrity of the webpage. We provide a dataset containing 8,809 legitimate and 5,000 phishing webpages, including all associated resources. Our dataset and tool are publicly available on GitHub, contributing to the research community by offering a more complete dataset for phishing detection.
Related papers
- PhishSnap: Image-Based Phishing Detection Using Perceptual Hashing [0.49812879456944986]
Phishing remains one of the most prevalent online threats, exploiting human trust to harvest sensitive credentials.<n>Existing URL- and HTML-based detection systems struggle against obfuscation and visual deception.<n>This paper presents textbfPhishSnap, a privacy-preserving, on-device phishing detection system leveraging perceptual hashing (pHash)
arXiv Detail & Related papers (2025-12-01T22:15:12Z) - Characterizing Phishing Pages by JavaScript Capabilities [77.64740286751834]
This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit.<n>For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs.<n>We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters.
arXiv Detail & Related papers (2025-09-16T15:39:23Z) - Bridging the Gap in Phishing Detection: A Comprehensive Phishing Dataset Collector [0.030786914102688596]
This paper introduces a resource collection tool designed to gather various resources associated with a URL, such as CSS, Javascript, favicons, webpage images, and screenshots.<n>We share a sample dataset generated using our tool comprising 4,056 legitimate and 5,666 phishing URLs along with their associated resources.
arXiv Detail & Related papers (2025-09-11T16:30:12Z) - Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing [71.30876587855867]
We show that even clean query images can be dangerous, inducing malicious target retrieval results, like undesired or illegal images.<n>Specifically, we first train a surrogate model to simulate the behavior of the target deep hashing model.<n>Then, a strict gradient matching strategy is proposed to generate the poisoned images.
arXiv Detail & Related papers (2025-03-27T07:54:27Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - PhishIntel: Toward Practical Deployment of Reference-Based Phishing Detection [33.98293686647553]
PhishIntel is an end-to-end phishing detection system for real-world deployment.<n>It segmenting the detection process into two distinct tasks: a fast task that checks against local blacklists and result cache, and a slow task that conducts online blacklist verification, URL crawling, and webpage analysis.<n>This fast-slow task system architecture ensures low response latency while retaining the robust detection capabilities of reference-based phishing detectors.
arXiv Detail & Related papers (2024-12-12T08:33:39Z) - From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks [0.8050163120218178]
Phishing attacks attempt to deceive users into stealing sensitive information, posing a significant cybersecurity threat.<n>We develop PhishOracle, a tool that generates adversarial phishing webpages by embedding diverse phishing features into legitimate webpages.<n>Our findings highlight the vulnerability of phishing detection models to adversarial attacks, emphasizing the need for more robust detection approaches.
arXiv Detail & Related papers (2024-07-29T18:21:34Z) - KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection [36.014171641453615]
We propose an automated knowledge collection pipeline, containing 20k brands with rich information about each brand.
KnowPhish can be used to boost the performance of existing reference-based phishing detectors.
Our resulting multimodal phishing detection approach, KnowPhish Detector, can detect phishing webpages with or without logos.
arXiv Detail & Related papers (2024-03-04T17:38:32Z) - Mitigating Bias in Machine Learning Models for Phishing Webpage Detection [0.8050163120218178]
Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs.
Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models.
This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets.
We propose a potential solution in the form of a tool engineered to alleviate bias in ML models.
arXiv Detail & Related papers (2024-01-16T13:45:54Z) - Detecting Phishing Sites -- An Overview [0.0]
Phishing is one of the most severe cyber-attacks where researchers are interested to find a solution.
To minimize the damage caused by phishing must be detected as early as possible.
There are various phishing detection techniques based on white-list, black-list, content-based, URL-based, visual-similarity and machine-learning.
arXiv Detail & Related papers (2021-03-23T19:16:03Z) - Being Single Has Benefits. Instance Poisoning to Deceive Malware
Classifiers [47.828297621738265]
We show how an attacker can launch a sophisticated and efficient poisoning attack targeting the dataset used to train a malware classifier.
As opposed to other poisoning attacks in the malware detection domain, our attack does not focus on malware families but rather on specific malware instances that contain an implanted trigger.
We propose a comprehensive detection approach that could serve as a future sophisticated defense against this newly discovered severe threat.
arXiv Detail & Related papers (2020-10-30T15:27:44Z) - Phishing and Spear Phishing: examples in Cyber Espionage and techniques
to protect against them [91.3755431537592]
Phishing attacks have become the most used technique in the online scams, initiating more than 91% of cyberattacks, from 2012 onwards.
This study reviews how Phishing and Spear Phishing attacks are carried out by the phishers, through 5 steps which magnify the outcome.
arXiv Detail & Related papers (2020-05-31T18:10:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.