A Graph-Attentive LSTM Model for Malicious URL Detection
- URL: http://arxiv.org/abs/2510.15971v1
- Date: Sun, 12 Oct 2025 17:10:29 GMT
- Title: A Graph-Attentive LSTM Model for Malicious URL Detection
- Authors: Md. Ifthekhar Hossain, Kazi Abdullah Al Arafat, Bryce Shepard, Kayd Craig, Imtiaz Parvez,
- Abstract summary: Blacklist detection methods fail to identify new or obfuscated URLs because they depend on pre-existing patterns.<n>This work presents a hybrid deep learning model named GNN-GAT-LSTM that combines Graph Neural Networks (GNNs) with Graph Attention Networks (GATs) and Long Short-Term Memory (LSTM) networks.<n>The model transforms URLs into graphs through a process where characters become nodes that connect through edges.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Malicious URLs pose significant security risks as they facilitate phishing attacks, distribute malware, and empower attackers to deface websites. Blacklist detection methods fail to identify new or obfuscated URLs because they depend on pre-existing patterns. This work presents a hybrid deep learning model named GNN-GAT-LSTM that combines Graph Neural Networks (GNNs) with Graph Attention Networks (GATs) and Long Short-Term Memory (LSTM) networks. The proposed architecture extracts both the structural and sequential patterns of the features from data. The model transforms URLs into graphs through a process where characters become nodes that connect through edges. It applies one-hot encoding to represent node features. The model received training and testing data from a collection of 651,191 URLs, which were classified into benign, phishing, defacement, and malware categories. The preprocessing stage included both feature engineering and data balancing techniques, which addressed the class imbalance issue to enhance model learning. The GNN-GAT-LSTM model achieved outstanding performance through its test accuracy of 0.9806 and its weighted F1-score of 0.9804. It showed excellent precision and recall performance across most classes, particularly for benign and defacement URLs. Overall, the model provides an efficient and scalable system for detecting malicious URLs while demonstrating strong potential for real-world cybersecurity applications.
Related papers
- Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering [0.0]
This paper presents an AI model for a phishing detection system.<n>It uses an ensemble approach to combine character-level Convolutional Neural Networks (CNN) and LightGBM with engineered features.<n>On a test dataset of 19,873 URLs, the ensemble model achieves an accuracy of 99.819 percent, precision of 100 percent, recall of 99.635 percent, and ROC-AUC of 99.947 percent.
arXiv Detail & Related papers (2025-12-18T16:19:12Z) - Phishing URL Detection using Bi-LSTM [0.0]
This paper proposes a deep learning-based approach to classify URLs into four categories: benign, phishing, defacement, and malware.<n> Experimental results on a dataset comprising over 650,000 URLs demonstrate the model's effectiveness, achieving 97% accuracy and significant improvements over traditional techniques.
arXiv Detail & Related papers (2025-04-29T00:55:01Z) - Efficient Phishing URL Detection Using Graph-based Machine Learning and Loopy Belief Propagation [12.89058029173131]
We propose a graph-based machine learning model for phishing URL detection.<n>We integrate URL structure and network-level features such as IP addresses and authoritative name servers.<n>Experiments on real-world datasets demonstrate our model's effectiveness by achieving F1 score of up to 98.77%.
arXiv Detail & Related papers (2025-01-12T19:49:00Z) - A New Dataset and Methodology for Malicious URL Classification [2.835223467109843]
Malicious URL (Uniform Resource Locator) classification is a pivotal aspect of Cybersecurity, offering defense against web-based threats.<n>Despite deep learning's promise in this area, its advancement is hindered by two main challenges: the scarcity of comprehensive, open-source datasets and the limitations of existing models.<n>We introduce a novel, multi-class dataset for malicious URL classification, distinguishing between benign, phishing and malicious URLs, named DeepURLBench.
arXiv Detail & Related papers (2024-12-31T09:10:38Z) - Link Stealing Attacks Against Inductive Graph Neural Networks [60.931106032824275]
A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data.
Previous work has shown that transductive GNNs are vulnerable to a series of privacy attacks.
This paper conducts a comprehensive privacy analysis of inductive GNNs through the lens of link stealing attacks.
arXiv Detail & Related papers (2024-05-09T14:03:52Z) - PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis [1.102674168371806]
Phishing URL identification is the best way to address the problem.
Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs.
We propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data.
arXiv Detail & Related papers (2024-04-27T17:13:49Z) - Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks.
Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs.
We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z) - Resisting Graph Adversarial Attack via Cooperative Homophilous
Augmentation [60.50994154879244]
Recent studies show that Graph Neural Networks are vulnerable and easily fooled by small perturbations.
In this work, we focus on the emerging but critical attack, namely, Graph Injection Attack.
We propose a general defense framework CHAGNN against GIA through cooperative homophilous augmentation of graph data and model.
arXiv Detail & Related papers (2022-11-15T11:44:31Z) - Model Inversion Attacks against Graph Neural Networks [65.35955643325038]
We study model inversion attacks against Graph Neural Networks (GNNs)
In this paper, we present GraphMI to infer the private training graph data.
Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.
arXiv Detail & Related papers (2022-09-16T09:13:43Z) - An Adversarial Attack Analysis on Malicious Advertisement URL Detection
Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks.
Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data.
In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z) - Stealing Links from Graph Neural Networks [72.85344230133248]
Recently, neural networks were extended to graph data, which are known as graph neural networks (GNNs)
Due to their superior performance, GNNs have many applications, such as healthcare analytics, recommender systems, and fraud detection.
We propose the first attacks to steal a graph from the outputs of a GNN model that is trained on the graph.
arXiv Detail & Related papers (2020-05-05T13:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.