PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks
- URL: http://arxiv.org/abs/2006.13742v3
- Date: Mon, 28 Sep 2020 09:04:11 GMT
- Title: PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks
- Authors: Joon Sern Lee, Gui Peng David Yam, Jin Hao Chan
- Abstract summary: Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack.
Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Homoglyph attacks are a common technique used by hackers to conduct phishing.
Domain names or links that are visually similar to actual ones are created via
punycode to obfuscate the attack, making the victim more susceptible to
phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com"
and in the process, divulge personal details to the fake website. Current State
of The Art (SOTA) typically make use of string comparison algorithms (e.g.
Levenshtein Distance), which are computationally heavy. One reason for this is
the lack of publicly available datasets thus hindering the training of more
advanced Machine Learning (ML) models. Furthermore, no one font is able to
render all types of punycode correctly, posing a significant challenge to the
creation of a dataset that is unbiased toward any particular font. This coupled
with the vast number of internet domains pose a challenge in creating a dataset
that can capture all possible variations. Here, we show how a conditional
Generative Adversarial Network (GAN), PhishGAN, can be used to generate images
of hieroglyphs, conditioned on non-homoglpyh input text images. Practical
changes to current SOTA were required to facilitate the generation of more
varied homoglyph text-based images. We also demonstrate a workflow of how
PhishGAN together with a Homoglyph Identifier (HI) model can be used to
identify the domain the homoglyph was trying to imitate. Furthermore, we
demonstrate how PhishGAN's ability to generate datasets on the fly facilitate
the quick adaptation of cybersecurity systems to detect new threats as they
emerge.
Related papers
- Mitigating Bias in Machine Learning Models for Phishing Webpage Detection [0.8050163120218178]
Phishing, a well-known cyberattack, revolves around the creation of phishing webpages and the dissemination of corresponding URLs.
Various techniques are available for preemptively categorizing zero-day phishing URLs by distilling unique attributes and constructing predictive models.
This proposal delves into persistent challenges within phishing detection solutions, particularly concentrated on the preliminary phase of assembling comprehensive datasets.
We propose a potential solution in the form of a tool engineered to alleviate bias in ML models.
arXiv Detail & Related papers (2024-01-16T13:45:54Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest.
We demonstrate that this technology can be attacked to generate content that subtly manipulates its users.
We propose a Backdoor Attack on text-to-image Generative Models (BAGM)
Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z) - GlyphNet: Homoglyph domains dataset and detection using attention-based
Convolutional Neural Networks [1.0312968200748118]
Homoglyph attacks create illegitimate domains that are hard to distinguish from legit ones.
Existing approaches use simple, string-based comparison techniques applied in primary language-based tasks.
We show that our model can reach state-of-the-art accuracy in detecting homoglyph attacks with a 0.93 AUC on our dataset.
arXiv Detail & Related papers (2023-06-17T17:16:53Z) - Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z) - Pseudo Label-Guided Model Inversion Attack via Conditional Generative
Adversarial Network [102.21368201494909]
Model inversion (MI) attacks have raised increasing concerns about privacy.
Recent MI attacks leverage a generative adversarial network (GAN) as an image prior to narrow the search space.
We propose Pseudo Label-Guided MI (PLG-MI) attack via conditional GAN (cGAN)
arXiv Detail & Related papers (2023-02-20T07:29:34Z) - A new weighted ensemble model for phishing detection based on feature
selection [0.0]
Phishing website identification can assist visitors in avoiding becoming victims of these assaults.
We have proposed an ensemble model that combines multiple base models with a voting technique based on the weights.
arXiv Detail & Related papers (2022-12-15T23:15:36Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z) - Weaponizing Unicodes with Deep Learning -- Identifying Homoglyphs with
Weakly Labeled Data [11.434810426156877]
Visually similar characters, or homoglyphs, can be used to perform social engineering attacks or to evade spam and plagiarism detectors.
We investigate a learning, transfer learning, and augmentation model to identify potential homoglyphs.
We also use our model to predict over 8,000 previously unknown homosglyph, and find good early indications that many may be true positives.
arXiv Detail & Related papers (2020-10-09T06:03:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.