Related papers: GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks

GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks

URL: http://arxiv.org/abs/2306.10392v1
Date: Sat, 17 Jun 2023 17:16:53 GMT
Title: GlyphNet: Homoglyph domains dataset and detection using attention-based Convolutional Neural Networks
Authors: Akshat Gupta, Laxman Singh Tomar, Ridhima Garg
Abstract summary: Homoglyph attacks create illegitimate domains that are hard to distinguish from legit ones. Existing approaches use simple, string-based comparison techniques applied in primary language-based tasks. We show that our model can reach state-of-the-art accuracy in detecting homoglyph attacks with a 0.93 AUC on our dataset.
Score: 1.0312968200748118
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyber attacks deceive machines into believing something that does not exist in the first place. However, there are some to which even humans fall prey. One such famous attack that attackers have used over the years to exploit the vulnerability of vision is known to be a Homoglyph attack. It employs a primary yet effective mechanism to create illegitimate domains that are hard to differentiate from legit ones. Moreover, as the difference is pretty indistinguishable for a user to notice, they cannot stop themselves from clicking on these homoglyph domain names. In many cases, that results in either information theft or malware attack on their systems. Existing approaches use simple, string-based comparison techniques applied in primary language-based tasks. Although they are impactful to some extent, they usually fail because they are not robust to different types of homoglyphs and are computationally not feasible because of their time requirement proportional to the string length. Similarly, neural network-based approaches are employed to determine real domain strings from fake ones. Nevertheless, the problem with both methods is that they require paired sequences of real and fake domain strings to work with, which is often not the case in the real world, as the attacker only sends the illegitimate or homoglyph domain to the vulnerable user. Therefore, existing approaches are not suitable for practical scenarios in the real world. In our work, we created GlyphNet, an image dataset that contains 4M domains, both real and homoglyphs. Additionally, we introduce a baseline method for a homoglyph attack detection system using an attention-based convolutional Neural Network. We show that our model can reach state-of-the-art accuracy in detecting homoglyph attacks with a 0.93 AUC on our dataset.

Related papers

Web Artifact Attacks Disrupt Vision Language Models [61.59021920232986]
Vision-language models (VLMs) are trained on large-scale, lightly curated web datasets.<n>They learn unintended correlations between semantic concepts and unrelated visual signals.<n>Prior work has weaponized these correlations as an attack vector to manipulate model predictions.<n>We introduce "artifact-based" attacks: a novel class of manipulations that mislead models using both non-matching text and graphical elements.
arXiv Detail & Related papers (2025-03-17T18:59:29Z)
The Devil is in the Conflict: Disentangled Information Graph Neural Networks for Fraud Detection [17.254383007779616]
We argue that the performance degradation is mainly attributed to the inconsistency between topology and attribute. We propose a simple and effective method that uses the attention mechanism to adaptively fuse two views. Our model can significantly outperform stateof-the-art baselines on real-world fraud detection datasets.
arXiv Detail & Related papers (2022-10-22T08:21:49Z)
BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability [12.079529913120593]
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. We take inspiration from the concept of Applicability Domain in cheminformatics. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally.
arXiv Detail & Related papers (2021-05-02T15:24:33Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
A Free Lunch for Unsupervised Domain Adaptive Object Detection without Source Data [69.091485888121]
Unsupervised domain adaptation assumes that source and target domain data are freely available and usually trained together to reduce the domain gap. We propose a source data-free domain adaptive object detection (SFOD) framework via modeling it into a problem of learning with noisy labels.
arXiv Detail & Related papers (2020-12-10T01:42:35Z)
MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks. The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z)
Weaponizing Unicodes with Deep Learning -- Identifying Homoglyphs with Weakly Labeled Data [11.434810426156877]
Visually similar characters, or homoglyphs, can be used to perform social engineering attacks or to evade spam and plagiarism detectors. We investigate a learning, transfer learning, and augmentation model to identify potential homoglyphs. We also use our model to predict over 8,000 previously unknown homosglyph, and find good early indications that many may be true positives.
arXiv Detail & Related papers (2020-10-09T06:03:18Z)
Adversarial Attack on Large Scale Graph [58.741365277995044]
Recent studies have shown that graph neural networks (GNNs) are vulnerable against perturbations due to lack of robustness. Currently, most works on attacking GNNs are mainly using gradient information to guide the attack and achieve outstanding performance. We argue that the main reason is that they have to use the whole graph for attacks, resulting in the increasing time and space complexity as the data scale grows. We present a practical metric named Degree Assortativity Change (DAC) to measure the impacts of adversarial attacks on graph data.
arXiv Detail & Related papers (2020-09-08T02:17:55Z)
Patch-wise Attack for Fooling Deep Neural Network [153.59832333877543]
We propose a patch-wise iterative algorithm -- a black-box attack towards mainstream normally trained and defense models. We significantly improve the success rate by 9.2% for defense models and 3.7% for normally trained models on average.
arXiv Detail & Related papers (2020-07-14T01:50:22Z)
PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks [0.0]
Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs.
arXiv Detail & Related papers (2020-06-24T13:59:09Z)
Adversarial Feature Desensitization [12.401175943131268]
We propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
arXiv Detail & Related papers (2020-06-08T14:20:02Z)
Stealing Links from Graph Neural Networks [72.85344230133248]
Recently, neural networks were extended to graph data, which are known as graph neural networks (GNNs) Due to their superior performance, GNNs have many applications, such as healthcare analytics, recommender systems, and fraud detection. We propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.
arXiv Detail & Related papers (2020-05-05T13:22:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.