Characterizing Phishing Pages by JavaScript Capabilities
- URL: http://arxiv.org/abs/2509.13186v1
- Date: Tue, 16 Sep 2025 15:39:23 GMT
- Title: Characterizing Phishing Pages by JavaScript Capabilities
- Authors: Aleksandr Nahapetyan, Kanv Khare, Kevin Schwarz, Bradley Reaves, Alexandros Kapravelos,
- Abstract summary: This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit.<n>For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs.<n>We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters.
- Score: 77.64740286751834
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In 2024, the Anti-Phishing Work Group identified over one million phishing pages. Phishers achieve this scale by using phishing kits -- ready-to-deploy phishing websites -- to rapidly deploy phishing campaigns with specific data exfiltration, evasion, or mimicry techniques. In contrast, researchers and defenders continue to fight phishing on a page-by-page basis and rely on manual analysis to recognize static features for kit identification. This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit, automating a previously manual process, and enabling us to measure how popular different client-side techniques are across these groups. For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs. On an unlabeled dataset, we leverage the complexity of 434,050 phishing pages' JavaScript logic to group them into 11,377 clusters, annotating the clusters with what phishing techniques they employ. We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters, respectively. On the other hand, mouse detection via the browser's mouse API is among the rarest behaviors, despite being used in a deployment of a 7-year-old open-source phishing kit. Our methods and findings provide new ways for researchers and analysts to tackle the volume of phishing pages.
Related papers
- PhishSnap: Image-Based Phishing Detection Using Perceptual Hashing [0.49812879456944986]
Phishing remains one of the most prevalent online threats, exploiting human trust to harvest sensitive credentials.<n>Existing URL- and HTML-based detection systems struggle against obfuscation and visual deception.<n>This paper presents textbfPhishSnap, a privacy-preserving, on-device phishing detection system leveraging perceptual hashing (pHash)
arXiv Detail & Related papers (2025-12-01T22:15:12Z) - Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks [43.26879314353337]
This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system.<n>By changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases.<n>For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate.
arXiv Detail & Related papers (2025-10-02T05:04:44Z) - Bridging the Gap in Phishing Detection: A Comprehensive Phishing Dataset Collector [0.030786914102688596]
This paper introduces a resource collection tool designed to gather various resources associated with a URL, such as CSS, Javascript, favicons, webpage images, and screenshots.<n>We share a sample dataset generated using our tool comprising 4,056 legitimate and 5,666 phishing URLs along with their associated resources.
arXiv Detail & Related papers (2025-09-11T16:30:12Z) - Phish-Blitz: Advancing Phishing Detection with Comprehensive Webpage Resource Collection and Visual Integrity Preservation [0.03262230127283452]
We introduce Phish-Blitz, a tool that downloads phishing and legitimate webpages along with their associated resources, such as screenshots.<n>Unlike existing tools, Phish-Blitz captures live webpage screenshots and updates resource file paths to maintain the original visual integrity of the webpage.<n>We provide a dataset containing 8,809 legitimate and 5,000 phishing webpages, including all associated resources.
arXiv Detail & Related papers (2025-09-10T08:13:49Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - From ML to LLM: Evaluating the Robustness of Phishing Webpage Detection Models against Adversarial Attacks [0.8050163120218178]
Phishing attacks attempt to deceive users into stealing sensitive information, posing a significant cybersecurity threat.<n>We develop PhishOracle, a tool that generates adversarial phishing webpages by embedding diverse phishing features into legitimate webpages.<n>Our findings highlight the vulnerability of phishing detection models to adversarial attacks, emphasizing the need for more robust detection approaches.
arXiv Detail & Related papers (2024-07-29T18:21:34Z) - Prompted Contextual Vectors for Spear-Phishing Detection [41.26408609344205]
Spear-phishing attacks present a significant security challenge.<n>We propose a detection approach based on a novel document vectorization method.<n>Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - Towards Web Phishing Detection Limitations and Mitigation [21.738240693843295]
We show how phishing sites bypass Machine Learning-based detection.
Experiments with 100K phishing/benign sites show promising accuracy (98.8%)
We propose Anti-SubtlePhish, a more resilient model based on logistic regression.
arXiv Detail & Related papers (2022-04-03T04:26:04Z) - PhishMatch: A Layered Approach for Effective Detection of Phishing URLs [8.658596218544774]
We present a layered anti-phishing defense, PhishMatch, which is robust, accurate, inexpensive, and client-side.
A prototype plugin of PhishMatch, developed for the Chrome browser, was found to be fast and lightweight.
arXiv Detail & Related papers (2021-12-04T03:21:29Z) - Detecting malicious PDF using CNN [46.86114958340962]
Malicious PDF files represent one of the biggest threats to computer security.
We propose a novel algorithm that uses an ensemble of Convolutional Neural Network (CNN) on the byte level of the file.
We show, using a data set of 90000 files downloadable online, that our approach maintains a high detection rate (94%) of PDF malware.
arXiv Detail & Related papers (2020-07-24T18:27:45Z) - Phishing and Spear Phishing: examples in Cyber Espionage and techniques
to protect against them [91.3755431537592]
Phishing attacks have become the most used technique in the online scams, initiating more than 91% of cyberattacks, from 2012 onwards.
This study reviews how Phishing and Spear Phishing attacks are carried out by the phishers, through 5 steps which magnify the outcome.
arXiv Detail & Related papers (2020-05-31T18:10:09Z) - Keystroke Biometrics in Response to Fake News Propagation in a Global
Pandemic [77.79066811371978]
This work proposes and analyzes the use of keystroke biometrics for content de-anonymization.
Fake news have become a powerful tool to manipulate public opinion, especially during major events.
arXiv Detail & Related papers (2020-05-15T17:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.