Related papers: Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN

URL: http://arxiv.org/abs/2502.01518v1
Date: Mon, 03 Feb 2025 16:51:58 GMT
Title: Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN
Authors: Gazi Tanbhir, Md. Farhan Shahriyar, Khandker Shahed, Abdullah Md Raihan Chy, Md Al Adnan,
Abstract summary: Smishing attacks have surged by 328%, posing a major threat to mobile users.<n>Despite its growing prevalence, the issue remains significantly under-addressed.<n>This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Smishing is a social engineering attack using SMS containing malicious content to deceive individuals into disclosing sensitive information or transferring money to cybercriminals. Smishing attacks have surged by 328%, posing a major threat to mobile users, with losses exceeding \$54.2 million in 2019. Despite its growing prevalence, the issue remains significantly under-addressed. This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts, combining Bidirectional Encoder Representations from Transformers (BERT) with Convolutional Neural Networks (CNNs) for enhanced character-level analysis. Our model addresses multi-class classification by distinguishing between Normal, Promotional, and Smishing SMS. Unlike traditional binary classification methods, our approach integrates BERT's contextual embeddings with CNN's character-level features, improving detection accuracy. Enhanced by an attention mechanism, the model effectively prioritizes crucial text segments. Our model achieves 98.47% accuracy, outperforming traditional classifiers, with high precision and recall in Smishing detection, and strong performance across all categories.

Related papers

GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks [2.184092672461171]
We propose a novel spam-text detection framework, GCC-Spam, which integrates three core innovations.<n>Character similarity network captures orthographic and phonetic features to counter character-obfuscation attacks.<n> contrastive learning enhances discriminability by optimizing the latent-space distance between spam and normal texts.<n>Generative Adversarial Network (GAN) generates realistic pseudo-spam samples to alleviate data scarcity.
arXiv Detail & Related papers (2025-07-19T16:09:48Z)
Hybrid LLM-Enhanced Intrusion Detection for Zero-Day Threats in IoT Networks [6.087274577167399]
This paper presents a novel approach to intrusion detection by integrating traditional signature-based methods with the contextual understanding capabilities of the GPT-2 Large Language Model (LLM)<n>We propose a hybrid IDS framework that merges the robustness of signature-based techniques with the adaptability of GPT-2-driven semantic analysis.<n> Experimental evaluations on a representative intrusion dataset demonstrate that our model enhances detection accuracy by 6.3%, reduces false positives by 9.0%, and maintains near real-time responsiveness.
arXiv Detail & Related papers (2025-07-10T04:10:03Z)
Machine Learning Driven Smishing Detection Framework for Mobile Security [0.46873264197900916]
smishing is a sophisticated variant of phishing conducted via SMS.<n>Traditional detection methods struggle with the informal and evolving nature of SMS language.<n>This paper presents an enhanced content-based smishing detection framework.
arXiv Detail & Related papers (2024-12-09T08:20:20Z)
KLCBL: An Improved Police Incident Classification Model [0.0]
Police incident data is crucial for public security intelligence, yet grassroots agencies struggle with efficient classification due to manual inefficiency and automated system limitations. This research proposes a multichannel neural network model, KLCBL, integrating Kolmogorov-Arnold Networks (KAN), a linguistically enhanced text preprocessing approach (LERT), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM) for police incident classification. The model addresses classification challenges, enhances police informatization, improves resource allocation, and offers broad applicability to other classification tasks.
arXiv Detail & Related papers (2024-11-11T07:02:23Z)
Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models. Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z)
Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts. We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z)
FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks. We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness. Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z)
Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks. Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs. We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z)
Text generation for dataset augmentation in security classification tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks. We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z)
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices [3.340416780217405]
This paper presents SecurityBERT, a novel architecture that leverages the Bidirectional Representations from Transformers (BERT) model for cyber threat detection in IoT networks. Our research demonstrates that SecurityBERT outperforms traditional Machine Learning (ML) and Deep Learning (DL) methods, such as Convolutional Neural Networks (CNNIoTs) or Recurrent Neural Networks (IoTRNNs) in cyber threat detection. SecurityBERT achieved an impressive 98.2% overall accuracy in identifying fourteen distinct attack types, surpassing previous records set by hybrid solutions.
arXiv Detail & Related papers (2023-06-25T15:04:21Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
A New Deep Boosted CNN and Ensemble Learning based IoT Malware Detection [0.0]
Security issues are threatened in various types of networks, especially in the Internet of Things (IoT) environment. We have developed a new malware detection framework, Deep Squeezed-Boosted and Ensemble Learning (DSBEL), comprised of novel Squeezed-Boosted Boundary-Region Split-Transform-Merge (SB-BR-STM) CNN and ensemble learning.
arXiv Detail & Related papers (2022-12-15T18:14:51Z)
Deep convolutional forest: a dynamic deep ensemble approach for spam detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically. As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.