Machine Learning Driven Smishing Detection Framework for Mobile Security
- URL: http://arxiv.org/abs/2412.09641v1
- Date: Mon, 09 Dec 2024 08:20:20 GMT
- Title: Machine Learning Driven Smishing Detection Framework for Mobile Security
- Authors: Diksha Goel, Hussain Ahmad, Ankit Kumar Jain, Nikhil Kumar Goel,
- Abstract summary: smishing is a sophisticated variant of phishing conducted via SMS.
Traditional detection methods struggle with the informal and evolving nature of SMS language.
This paper presents an enhanced content-based smishing detection framework.
- Score: 0.46873264197900916
- License:
- Abstract: The increasing reliance on smartphones for communication, financial transactions, and personal data management has made them prime targets for cyberattacks, particularly smishing, a sophisticated variant of phishing conducted via SMS. Despite the growing threat, traditional detection methods often struggle with the informal and evolving nature of SMS language, which includes abbreviations, slang, and short forms. This paper presents an enhanced content-based smishing detection framework that leverages advanced text normalization techniques to improve detection accuracy. By converting nonstandard text into its standardized form, the proposed model enhances the efficacy of machine learning classifiers, particularly the Naive Bayesian classifier, in distinguishing smishing messages from legitimate ones. Our experimental results, validated on a publicly available dataset, demonstrate a detection accuracy of 96.2%, with a low False Positive Rate of 3.87% and False Negative Rate of 2.85%. This approach significantly outperforms existing methodologies, providing a robust solution to the increasingly sophisticated threat of smishing in the mobile environment.
Related papers
- Hybrid Machine Learning Model for Detecting Bangla Smishing Text Using BERT and Character-Level CNN [0.0]
Smishing attacks have surged by 328%, posing a major threat to mobile users.
Despite its growing prevalence, the issue remains significantly under-addressed.
This paper presents a novel hybrid machine learning model for detecting Bangla smishing texts.
arXiv Detail & Related papers (2025-02-03T16:51:58Z) - Detection and Prevention of Smishing Attacks [0.0]
This work presents a smishing detection model using a content-based analysis approach.
To address the challenge posed by slang, abbreviations, and short forms in text communication, the model normalizes these into standard forms.
Experimental results demonstrate the model effectiveness, achieving classification accuracies of 97.14% for smishing and 96.12% for ham messages.
arXiv Detail & Related papers (2024-12-31T04:07:12Z) - TextSleuth: Towards Explainable Tampered Text Detection [49.88698441048043]
We propose to explain the basis of tampered text detection with natural language via large multimodal models.
To fill the data gap for this task, we propose a large-scale, comprehensive dataset, ETTD.
Elaborate queries are introduced to generate high-quality anomaly descriptions with GPT4o.
To automatically filter out low-quality annotations, we also propose to prompt GPT4o to recognize tampered texts.
arXiv Detail & Related papers (2024-12-19T13:10:03Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing [0.0]
This research addresses the pervasive issue of SMS spam, which poses threats to users' privacy and security.
The study introduces a novel approach utilizing Natural Language Processing (NLP) and machine learning models, particularly BERT (Bidirectional Representations from Transformers) for spam detection and classification.
Evaluation results revealed that the Na"ive Bayes + BERT model achieves the highest accuracy at 97.31% with the fastest execution time of 0.3 seconds on the test dataset.
arXiv Detail & Related papers (2024-06-04T13:44:36Z) - ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis [2.849988619791745]
The number of SMS spam has expanded significantly in recent years.
The unstructured format of SMS data creates significant challenges for SMS spam detection.
We employ optimized and fine-tuned transformer-based Large Language Models (LLMs) to solve the problem of spam message detection.
arXiv Detail & Related papers (2024-05-12T11:42:05Z) - Android Malware Detection with Unbiased Confidence Guarantees [1.6432632226868131]
We propose a machine learning dynamic analysis approach that provides provably valid confidence guarantees in each malware detection.
The proposed approach is based on a novel machine learning framework, called Conformal Prediction, combined with a random forests classifier.
We examine its performance on a large-scale dataset collected by installing 1866 malicious and 4816 benign applications on a real android device.
arXiv Detail & Related papers (2023-12-17T11:07:31Z) - Text generation for dataset augmentation in security classification
tasks [55.70844429868403]
This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks.
We find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.
arXiv Detail & Related papers (2023-10-22T22:25:14Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.