Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI
- URL: http://arxiv.org/abs/2505.16103v1
- Date: Thu, 22 May 2025 01:04:13 GMT
- Title: Towards Trustworthy Keylogger detection: A Comprehensive Analysis of Ensemble Techniques and Feature Selections through Explainable AI
- Authors: Monirul Islam Mahmud,
- Abstract summary: Keylogger detection involves monitoring for unusual system behaviors such as delays between typing and character display.<n>In this study, we provide a comprehensive analysis for keylogger detection with traditional machine learning models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keylogger detection involves monitoring for unusual system behaviors such as delays between typing and character display, analyzing network traffic patterns for data exfiltration. In this study, we provide a comprehensive analysis for keylogger detection with traditional machine learning models - SVC, Random Forest, Decision Tree, XGBoost, AdaBoost, Logistic Regression and Naive Bayes and advanced ensemble methods including Stacking, Blending and Voting. Moreover, feature selection approaches such as Information gain, Lasso L1 and Fisher Score are thoroughly assessed to improve predictive performance and lower computational complexity. The Keylogger Detection dataset from publicly available Kaggle website is used in this project. In addition to accuracy-based classification, this study implements the approach for model interpretation using Explainable AI (XAI) techniques namely SHAP (Global) and LIME (Local) to deliver finer explanations for how much each feature contributes in assisting or hindering the detection process. To evaluate the models result, we have used AUC score, sensitivity, Specificity, Accuracy and F1 score. The best performance was achieved by AdaBoost with 99.76% accuracy, F1 score of 0.99, 100% precision, 98.6% recall, 1.0 specificity and 0.99 of AUC that is near-perfect classification with Fisher Score.
Related papers
- Evaluating Ensemble and Deep Learning Models for Static Malware Detection with Dimensionality Reduction Using the EMBER Dataset [0.0]
This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset.<n>We evaluate eight classification models: LightGBM, XGBoost, CatBoost, Random Forest, Extra Trees, HistGradientBoosting, k-Nearest Neighbors (KNN), and TabNet.<n>The models are assessed on accuracy, precision, recall, F1 score, and AUC to examine both predictive performance and robustness.
arXiv Detail & Related papers (2025-07-22T18:45:10Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.<n>We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.<n>By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Explainable AI for Comparative Analysis of Intrusion Detection Models [20.683181384051395]
This research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic.
We trained all models to the accuracy of 90% on the UNSW-NB15 dataset.
We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness.
arXiv Detail & Related papers (2024-06-14T03:11:01Z) - Ransomware detection using stacked autoencoder for feature selection [0.0]
The study meticulously analyzes the autoencoder's learned weights and activations to identify essential features for distinguishing ransomware families from other malware.
The proposed model achieves an exceptional 99% accuracy in ransomware classification, surpassing the Extreme Gradient Boosting (XGBoost) algorithm.
arXiv Detail & Related papers (2024-02-17T17:31:48Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge
Distillation [86.41437210485932]
We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously.
We propose a novel end-to-end zero-shot HOI Detection framework via vision-language knowledge distillation.
Our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP.
arXiv Detail & Related papers (2022-04-01T07:27:19Z) - Explainable AI Integrated Feature Selection for Landslide Susceptibility
Mapping using TreeSHAP [0.0]
An early prediction of landslide susceptibility using a data-driven approach is a demand of time.
We employed state-of-the-art machine learning algorithms including XgBoost, LR, KNN, SVM, and Adaboost for landslide susceptibility prediction.
An optimized version of XgBoost along with feature reduction by 40 % has outperformed all other classifiers in terms of popular evaluation metrics.
arXiv Detail & Related papers (2022-01-10T09:17:21Z) - Utilizing XAI technique to improve autoencoder based model for computer
network anomaly detection with shapley additive explanation(SHAP) [0.0]
Machine learning (ML) and Deep Learning (DL) methods are being adopted rapidly, especially in computer network security.
Lack of transparency of ML and DL based models is a major obstacle to their implementation and criticized due to its black-box nature.
XAI is a promising area that can improve the trustworthiness of these models by giving explanations and interpreting its output.
arXiv Detail & Related papers (2021-12-14T09:42:04Z) - A Survey of Machine Learning Algorithms for Detecting Ransomware
Encryption Activity [0.0]
A survey of machine learning techniques trained to detect ransomware is presented.
This work builds upon the efforts of Taylor et al. in using sensor-based methods to identify encryption activity.
A random forest model produces scores of 93% accuracy and 92% F1, showing that sensor-based detection is currently a viable option to detect even zero-day ransomware attacks before the code fully executes.
arXiv Detail & Related papers (2021-10-14T18:02:31Z) - Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and
Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning.
In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection.
We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.