Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering
- URL: http://arxiv.org/abs/2512.16717v1
- Date: Thu, 18 Dec 2025 16:19:12 GMT
- Title: Phishing Detection System: An Ensemble Approach Using Character-Level CNN and Feature Engineering
- Authors: Rudra Dubey, Arpit Mani Tripathi, Archit Srivastava, Sarvpal Singh,
- Abstract summary: This paper presents an AI model for a phishing detection system.<n>It uses an ensemble approach to combine character-level Convolutional Neural Networks (CNN) and LightGBM with engineered features.<n>On a test dataset of 19,873 URLs, the ensemble model achieves an accuracy of 99.819 percent, precision of 100 percent, recall of 99.635 percent, and ROC-AUC of 99.947 percent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In actuality, phishing attacks remain one of the most prevalent cybersecurity risks in existence today, with malevolent actors constantly changing their strategies to successfully trick users. This paper presents an AI model for a phishing detection system that uses an ensemble approach to combine character-level Convolutional Neural Networks (CNN) and LightGBM with engineered features. Our system uses a character-level CNN to extract sequential features after extracting 36 lexical, structural, and domain-based features from the URLs. On a test dataset of 19,873 URLs, the ensemble model achieves an accuracy of 99.819 percent, precision of 100 percent, recall of 99.635 percent, and ROC-AUC of 99.947 percent. Through a FastAPI-based service with an intuitive user interface, the suggested system has been utilised to offer real-time detection. In contrast, the results demonstrate that the suggested solution performs better than individual models; LightGBM contributes 40 percent and character-CNN contributes 60 percent to the final prediction. The suggested method maintains extremely low false positive rates while doing a good job of identifying contemporary phishing techniques. Index Terms - Phishing detection, machine learning, deep learning, CNN, ensemble methods, cybersecurity, URL analysis
Related papers
- Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features [0.0]
Phishing is a cybercrime in which individuals are deceived into revealing personal information.<n>This study proposes a Quantile Deep Q-Network (QR-DQN) approach that integrates RoBERTa semantic embeddings with handcrafted lexical features.<n>A diverse dataset of 105,000 URLs was curated from PhishTank, OpenPhish, Cloudflare, and other sources.
arXiv Detail & Related papers (2025-12-07T17:08:12Z) - A Graph-Attentive LSTM Model for Malicious URL Detection [0.0]
Blacklist detection methods fail to identify new or obfuscated URLs because they depend on pre-existing patterns.<n>This work presents a hybrid deep learning model named GNN-GAT-LSTM that combines Graph Neural Networks (GNNs) with Graph Attention Networks (GATs) and Long Short-Term Memory (LSTM) networks.<n>The model transforms URLs into graphs through a process where characters become nodes that connect through edges.
arXiv Detail & Related papers (2025-10-12T17:10:29Z) - EXPLICATE: Enhancing Phishing Detection through Explainable AI and LLM-Powered Interpretability [44.2907457629342]
EXPLICATE is a framework that enhances phishing detection through a three-component architecture.<n>It is on par with existing deep learning techniques but has better explainability.<n>It addresses the critical divide between automated AI and user trust in phishing detection systems.
arXiv Detail & Related papers (2025-03-22T23:37:35Z) - Browser Extension for Fake URL Detection [0.0]
This paper presents a Browser Extension that uses machine learning models to enhance online security.
The proposed solution uses LGBM classifier for classification of Phishing websites.
The Model for Spam email detection uses Multinomial NB algorithm which has been trained on a dataset with over 5500 messages.
arXiv Detail & Related papers (2024-11-16T07:22:59Z) - PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis [1.102674168371806]
Phishing URL identification is the best way to address the problem.
Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs.
We propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data.
arXiv Detail & Related papers (2024-04-27T17:13:49Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - FLTracer: Accurate Poisoning Attack Provenance in Federated Learning [38.47921452675418]
Federated Learning (FL) is a promising distributed learning approach that enables multiple clients to collaboratively train a shared global model.
Recent studies show that FL is vulnerable to various poisoning attacks, which can degrade the performance of global models or introduce backdoors into them.
We propose FLTracer, the first FL attack framework to accurately detect various attacks and trace the attack time, objective, type, and poisoned location of updates.
arXiv Detail & Related papers (2023-10-20T11:24:38Z) - PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool [12.468922937529966]
We propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD)
This measure computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction.
We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages.
arXiv Detail & Related papers (2022-07-13T20:44:03Z) - Adversarially robust deepfake media detection using fused convolutional
neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data.
We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos.
The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z) - Fusion of CNNs and statistical indicators to improve image
classification [65.51757376525798]
Convolutional Networks have dominated the field of computer vision for the last ten years.
Main strategy to prolong this trend relies on further upscaling networks in size.
We hypothesise that adding heterogeneous sources of information may be more cost-effective to a CNN than building a bigger network.
arXiv Detail & Related papers (2020-12-20T23:24:31Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.