Detection of Malicious Websites Using Machine Learning Techniques
- URL: http://arxiv.org/abs/2209.09630v1
- Date: Tue, 13 Sep 2022 13:48:31 GMT
- Title: Detection of Malicious Websites Using Machine Learning Techniques
- Authors: Adebayo Oshingbesan, Courage Ekoh, Chukwuemeka Okobi, Aime Munezero,
Kagame Richard
- Abstract summary: K-Nearest Neighbor is the only model that performs consistently high across datasets.
Other models such as Random Forest, Decision Trees, Logistic Regression, and Support Vector Machines also consistently outperform a baseline model of predicting every link as malicious.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In detecting malicious websites, a common approach is the use of blacklists
which are not exhaustive in themselves and are unable to generalize to new
malicious sites. Detecting newly encountered malicious websites automatically
will help reduce the vulnerability to this form of attack. In this study, we
explored the use of ten machine learning models to classify malicious websites
based on lexical features and understand how they generalize across datasets.
Specifically, we trained, validated, and tested these models on different sets
of datasets and then carried out a cross-datasets analysis. From our analysis,
we found that K-Nearest Neighbor is the only model that performs consistently
high across datasets. Other models such as Random Forest, Decision Trees,
Logistic Regression, and Support Vector Machines also consistently outperform a
baseline model of predicting every link as malicious across all metrics and
datasets. Also, we found no evidence that any subset of lexical features
generalizes across models or datasets. This research should be relevant to
cybersecurity professionals and academic researchers as it could form the basis
for real-life detection systems or further research work.
Related papers
- Evaluating Predictive Models in Cybersecurity: A Comparative Analysis of Machine and Deep Learning Techniques for Threat Detection [0.0]
This paper examines and compares various machine learning as well as deep learning models to choose the most suitable ones for detecting and fighting against cybersecurity risks.
The two datasets are used in the study to assess models like Naive Bayes, SVM, Random Forest, and deep learning architectures, i.e., VGG16, in the context of accuracy, precision, recall, and F1-score.
arXiv Detail & Related papers (2024-07-08T15:05:59Z) - On the Cross-Dataset Generalization of Machine Learning for Network
Intrusion Detection [50.38534263407915]
Network Intrusion Detection Systems (NIDS) are a fundamental tool in cybersecurity.
Their ability to generalize across diverse networks is a critical factor in their effectiveness and a prerequisite for real-world applications.
In this study, we conduct a comprehensive analysis on the generalization of machine-learning-based NIDS through an extensive experimentation in a cross-dataset framework.
arXiv Detail & Related papers (2024-02-15T14:39:58Z) - IoTGeM: Generalizable Models for Behaviour-Based IoT Attack Detection [3.3772986620114387]
We present an approach for modelling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance.
First, we present an improved rolling window approach for feature extraction, and introduce a multi-step feature selection process that reduces overfitting.
Second, we build and test models using isolated train and test datasets, thereby avoiding common data leaks.
Third, we rigorously evaluate our methodology using a diverse portfolio of machine learning models, evaluation metrics and datasets.
arXiv Detail & Related papers (2023-10-17T21:46:43Z) - Data-Free Model Extraction Attacks in the Context of Object Detection [0.6719751155411076]
A significant number of machine learning models are vulnerable to model extraction attacks.
We propose an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection.
We find that the proposed model extraction method achieves significant results by using reasonable queries.
arXiv Detail & Related papers (2023-08-09T06:23:54Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - An Adversarial Attack Analysis on Malicious Advertisement URL Detection
Framework [22.259444589459513]
Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks.
Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data.
In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection.
arXiv Detail & Related papers (2022-04-27T20:06:22Z) - Finding Facial Forgery Artifacts with Parts-Based Detectors [73.08584805913813]
We design a series of forgery detection systems that each focus on one individual part of the face.
We use these detectors to perform detailed empirical analysis on the FaceForensics++, Celeb-DF, and Facebook Deepfake Detection Challenge datasets.
arXiv Detail & Related papers (2021-09-21T16:18:45Z) - Learning to Detect: A Data-driven Approach for Network Intrusion
Detection [17.288512506016612]
We perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks.
Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy.
We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks.
arXiv Detail & Related papers (2021-08-18T21:19:26Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.