OutCenTR: A novel semi-supervised framework for predicting exploits of
vulnerabilities in high-dimensional datasets
- URL: http://arxiv.org/abs/2304.10511v1
- Date: Mon, 3 Apr 2023 00:34:41 GMT
- Title: OutCenTR: A novel semi-supervised framework for predicting exploits of
vulnerabilities in high-dimensional datasets
- Authors: Hadi Eskandari, Michael Bewong, Sabih ur Rehman
- Abstract summary: We make use of outlier detection techniques to predict vulnerabilities that are likely to be exploited.
We propose a dimensionality reduction technique, OutCenTR, that enhances the baseline outlier detection models.
The results of our experiments show on average a 5-fold improvement of F1 score in comparison with state-of-the-art dimensionality reduction techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: An ever-growing number of vulnerabilities are reported every day. Yet these
vulnerabilities are not all the same; Some are more targeted than others.
Correctly estimating the likelihood of a vulnerability being exploited is a
critical task for system administrators. This aids the system administrators in
prioritizing and patching the right vulnerabilities. Our work makes use of
outlier detection techniques to predict vulnerabilities that are likely to be
exploited in highly imbalanced and high-dimensional datasets such as the
National Vulnerability Database. We propose a dimensionality reduction
technique, OutCenTR, that enhances the baseline outlier detection models. We
further demonstrate the effectiveness and efficiency of OutCenTR empirically
with 4 benchmark and 12 synthetic datasets. The results of our experiments show
on average a 5-fold improvement of F1 score in comparison with state-of-the-art
dimensionality reduction techniques such as PCA and GRP.
Related papers
- Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation [4.374800396968465]
We propose a data augmentation technique aimed at enhancing the performance of pre-trained language models for vulnerability detection.
By incorporating our augmented dataset in fine-tuning a series of representative code pre-trained models, up to 10.1% increase in accuracy and 23.6% increase in F1 can be achieved.
arXiv Detail & Related papers (2024-09-30T21:44:05Z) - A Relevance Model for Threat-Centric Ranking of Cybersecurity Vulnerabilities [0.29998889086656577]
The relentless process of tracking and remediating vulnerabilities is a top concern for cybersecurity professionals.
We provide a framework for vulnerability management specifically focused on mitigating threats using adversary criteria derived from MITRE ATT&CK.
Our results show an average 71.5% - 91.3% improvement towards the identification of vulnerabilities likely to be targeted and exploited by cyber threat actors.
arXiv Detail & Related papers (2024-06-09T23:29:12Z) - Enhancing Code Vulnerability Detection via Vulnerability-Preserving Data Augmentation [29.72520866016839]
Source code vulnerability detection aims to identify inherent vulnerabilities to safeguard software systems from potential attacks.
Many prior studies overlook diverse vulnerability characteristics, simplifying the problem into a binary (0-1) classification task.
FGVulDet employs multiple classifiers to discern characteristics of various vulnerability types and combines their outputs to identify the specific type of vulnerability.
FGVulDet is trained on a large-scale dataset from GitHub, encompassing five different types of vulnerabilities.
arXiv Detail & Related papers (2024-04-15T09:10:52Z) - Vulnerability Detection with Code Language Models: How Far Are We? [40.455600722638906]
PrimeVul is a new dataset for training and evaluating code LMs for vulnerability detection.
It incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks.
It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues.
arXiv Detail & Related papers (2024-03-27T14:34:29Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism [3.9377491512285157]
DefectHunter is an innovative model for vulnerability identification that employs the Conformer mechanism.
This mechanism fuses self-attention with convolutional networks to capture both local, position-wise features and global, content-based interactions.
arXiv Detail & Related papers (2023-09-27T00:10:29Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.