Related papers: PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

URL: http://arxiv.org/abs/2006.13742v3
Date: Mon, 28 Sep 2020 09:04:11 GMT
Title: PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks
Authors: Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan
Abstract summary: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a challenge in creating a dataset that can capture all possible variations. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs, conditioned on non-homoglpyh input text images. Practical changes to current SOTA were required to facilitate the generation of more varied homoglyph text-based images. We also demonstrate a workflow of how PhishGAN together with a Homoglyph Identifier (HI) model can be used to identify the domain the homoglyph was trying to imitate. Furthermore, we demonstrate how PhishGAN's ability to generate datasets on the fly facilitate the quick adaptation of cybersecurity systems to detect new threats as they emerge.

Related papers

Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing [71.30876587855867]
We show that even clean query images can be dangerous, inducing malicious target retrieval results, like undesired or illegal images. Specifically, we first train a surrogate model to simulate the behavior of the target deep hashing model. Then, a strict gradient matching strategy is proposed to generate the poisoned images.
arXiv Detail & Related papers (2025-03-27T07:54:27Z)
Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets. They learn unintended correlations between semantic concepts and unrelated visual signals. Prior work has weaponized these correlations as an attack vector to manipulate model predictions. We introduce artifact-based attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
Mitigating Bias in Machine Learning Models for Phishing Webpage Detection [0.8050163120218178]
Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs. Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models. This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets. We propose a potential solution in the form of a tool engineered to alleviate bias in ML models.
arXiv Detail & Related papers (2024-01-16T13:45:54Z)
Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL) Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images. Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM) Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z)
GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks [1.0312968200748118]
Homoglyph attacks create illegitimate domains that are hard to distinguish from legit ones. Existing approaches use simple, string-based comparison techniques applied in primary language-based tasks. We show that our model can reach state-of-the-art accuracy in detecting homoglyph attacks with a 0.93 AUC on our dataset.
arXiv Detail & Related papers (2023-06-17T17:16:53Z)
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z)
Pseudo Label-Guided Model Inversion Attack via Conditional Generative Adversarial Network [102.21368201494909]
Model inversion (MI) attacks have raised increasing concerns about privacy. Recent MI attacks leverage a generative adversarial network (GAN) as an image prior to narrow the search space. We propose Pseudo Label-Guided MI (PLG-MI) attack via conditional GAN (cGAN)
arXiv Detail & Related papers (2023-02-20T07:29:34Z)
A new weighted ensemble model for phishing detection based on feature selection [0.0]
Phishing website identification can assist visitors in avoiding becoming victims of these assaults. We have proposed an ensemble model that combines multiple base models with a voting technique based on the weights.
arXiv Detail & Related papers (2022-12-15T23:15:36Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
Weaponizing Unicodes with Deep Learning -- Identifying Homoglyphs with Weakly Labeled Data [11.434810426156877]
Visually similar characters, or homoglyphs, can be used to perform social engineering attacks or to evade spam and plagiarism detectors. We investigate a learning, transfer learning, and augmentation model to identify potential homoglyphs. We also use our model to predict over 8,000 previously unknown homosglyph, and find good early indications that many may be true positives.
arXiv Detail & Related papers (2020-10-09T06:03:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.