Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data
- URL: http://arxiv.org/abs/2505.10600v1
- Date: Thu, 15 May 2025 14:02:48 GMT
- Title: Enhancing IoT Cyber Attack Detection in the Presence of Highly Imbalanced Data
- Authors: Md. Ehsanul Haque, Md. Saymon Hosen Polash, Md Al-Imran Sanjida Simla, Md Alomgir Hossain, Sarwar Jahan,
- Abstract summary: This study uses hybrid sampling techniques to improve data imbalance detection accuracy in IoT domains.<n>We evaluate the performance of several machine learning models with respect to the classification of cyber-attacks.<n>Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the rapid growth in the number of Internet of Things (IoT) networks, the cyber risk has increased exponentially, and therefore, we have to develop effective IDS that can work well with highly imbalanced datasets. A high rate of missed threats can be the result, as traditional machine learning models tend to struggle in identifying attacks when normal data volume is much higher than the volume of attacks. For example, the dataset used in this study reveals a strong class imbalance with 94,659 instances of the majority class and only 28 instances of the minority class, making it quite challenging to determine rare attacks accurately. The challenges presented in this research are addressed by hybrid sampling techniques designed to improve data imbalance detection accuracy in IoT domains. After applying these techniques, we evaluate the performance of several machine learning models such as Random Forest, Soft Voting, Support Vector Classifier (SVC), K-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and Logistic Regression with respect to the classification of cyber-attacks. The obtained results indicate that the Random Forest model achieved the best performance with a Kappa score of 0.9903, test accuracy of 0.9961, and AUC of 0.9994. Strong performance is also shown by the Soft Voting model, with an accuracy of 0.9952 and AUC of 0.9997, indicating the benefits of combining model predictions. Overall, this work demonstrates the value of hybrid sampling combined with robust model and feature selection for significantly improving IoT security against cyber-attacks, especially in highly imbalanced data environments.
Related papers
- Generative Active Adaptation for Drifting and Imbalanced Network Intrusion Detection [15.146203784334086]
We propose a generative active adaptation framework that minimizes labeling effort while enhancing model robustness.<n>We evaluate our end-to-end framework on both simulated IDS data and a real-world ISP dataset.<n>Our framework effectively enhances rare attack detection while reducing labeling costs, making it a scalable and adaptive solution for real-world intrusion detection.
arXiv Detail & Related papers (2025-03-04T21:49:42Z) - Hybrid Deep Convolutional Neural Networks Combined with Autoencoders And Augmented Data To Predict The Look-Up Table 2006 [2.082445711353476]
This study explores the development of a hybrid deep convolutional neural network (DCNN) model enhanced by autoencoders and data augmentation techniques.
By augmenting the original input features using three different autoencoder configurations, the model's predictive capabilities were significantly improved.
arXiv Detail & Related papers (2024-08-26T20:45:07Z) - PrivFED -- A Framework for Privacy-Preserving Federated Learning in Enhanced Breast Cancer Diagnosis [0.0]
This study introduces a federated learning framework, trained on the Wisconsin dataset, to mitigate challenges such as data scarcity and imbalance.
The model exhibits an average accuracy of 99.95% on edge devices and 98% on the central server.
arXiv Detail & Related papers (2024-05-13T18:01:57Z) - Comparative Analysis of Imbalanced Malware Byteplot Image Classification
using Transfer Learning [0.873811641236639]
Malware detectors help cyber-attacks by comparing malware signatures.
In this paper, the performance of six multiclass classification models is compared.
It is observed that the more the class imbalance less the number of epochs required for convergence.
arXiv Detail & Related papers (2023-10-04T11:33:36Z) - GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models [60.48306899271866]
We present a new framework, called GREAT Score, for global robustness evaluation of adversarial perturbation using generative models.
We show high correlation and significantly reduced cost of GREAT Score when compared to the attack-based model ranking on RobustBench.
GREAT Score can be used for remote auditing of privacy-sensitive black-box models.
arXiv Detail & Related papers (2023-04-19T14:58:27Z) - A Dependable Hybrid Machine Learning Model for Network Intrusion
Detection [1.222622290392729]
We propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability.
Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022.
arXiv Detail & Related papers (2022-12-08T20:19:27Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.