Ensemble Classifier Design Tuned to Dataset Characteristics for Network
Intrusion Detection
- URL: http://arxiv.org/abs/2205.06177v1
- Date: Sun, 8 May 2022 21:06:42 GMT
- Title: Ensemble Classifier Design Tuned to Dataset Characteristics for Network
Intrusion Detection
- Authors: Zeinab Zoghi, Gursel Serpen
- Abstract summary: Two new algorithms are proposed to address the class overlap issue in the dataset.
The proposed design is evaluated for both binary and multi-category classification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning-based supervised approaches require highly customized and
fine-tuned methodologies to deliver outstanding performance. This paper
presents a dataset-driven design and performance evaluation of a machine
learning classifier for the network intrusion dataset UNSW-NB15. Analysis of
the dataset suggests that it suffers from class representation imbalance and
class overlap in the feature space. We employed ensemble methods using Balanced
Bagging (BB), eXtreme Gradient Boosting (XGBoost), and Random Forest empowered
by Hellinger Distance Decision Tree (RF-HDDT). BB and XGBoost are tuned to
handle the imbalanced data, and Random Forest (RF) classifier is supplemented
by the Hellinger metric to address the imbalance issue. Two new algorithms are
proposed to address the class overlap issue in the dataset. These two
algorithms are leveraged to help improve the performance of the testing dataset
by modifying the final classification decision made by three base classifiers
as part of the ensemble classifier which employs a majority vote combiner. The
proposed design is evaluated for both binary and multi-category classification.
Comparing the proposed model to those reported on the same dataset in the
literature demonstrate that the proposed model outperforms others by a
significant margin for both binary and multi-category classification cases.
Related papers
- Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Characterizing the Optimal 0-1 Loss for Multi-class Classification with
a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset.
We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - SetConv: A New Approach for Learning from Imbalanced Data [29.366843553056594]
We propose a set convolution operation and an episodic training strategy to extract a single representative for each class.
We prove that our proposed algorithm is permutation-invariant despite the order of inputs.
arXiv Detail & Related papers (2021-04-03T22:33:30Z) - Binary Classification from Multiple Unlabeled Datasets via Surrogate Set
Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$.
Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Improved Design of Quadratic Discriminant Analysis Classifier in
Unbalanced Settings [19.763768111774134]
quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended.
We propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias.
arXiv Detail & Related papers (2020-06-11T12:17:05Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Diversity-Aware Weighted Majority Vote Classifier for Imbalanced Data [1.2944868613449219]
We propose a diversity-aware ensemble learning based algorithm, DAMVI, to deal with imbalanced binary classification tasks.
We show efficiency of the proposed approach with respect to state-of-art models on predictive maintenance task, credit card fraud detection, webpage classification and medical applications.
arXiv Detail & Related papers (2020-04-16T11:27:50Z) - Adaptive Name Entity Recognition under Highly Unbalanced Data [5.575448433529451]
We present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks.
We introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set.
arXiv Detail & Related papers (2020-03-10T06:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.