Related papers: Predicting Organizational Cybersecurity Risk: A Deep Learning Approach

Predicting Organizational Cybersecurity Risk: A Deep Learning Approach

URL: http://arxiv.org/abs/2012.14425v1
Date: Sat, 26 Dec 2020 01:15:34 GMT
Title: Predicting Organizational Cybersecurity Risk: A Deep Learning Approach
Authors: Benjamin M. Ampel
Abstract summary: Hackers use exploits found on hacker forums to carry out complex cyberattacks. We propose a hacker forum entity recognition framework (HackER) to identify exploits and the entities that the exploits target. HackER then uses a bidirectional long short-term memory model (BiLSTM) to create a predictive model for what companies will be targeted by exploits.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cyberattacks conducted by malicious hackers cause irreparable damage to organizations, governments, and individuals every year. Hackers use exploits found on hacker forums to carry out complex cyberattacks, making exploration of these forums vital. We propose a hacker forum entity recognition framework (HackER) to identify exploits and the entities that the exploits target. HackER then uses a bidirectional long short-term memory model (BiLSTM) to create a predictive model for what companies will be targeted by exploits. The results of the algorithm will be evaluated using a manually labeled gold-standard test dataset, using accuracy, precision, recall, and F1-score as metrics. We choose to compare our model against state of the art classical machine learning and deep learning benchmark models. Results show that our proposed HackER BiLSTM model outperforms all classical machine learning and deep learning models in F1-score (79.71%). These results are statistically significant at 0.05 or lower for all benchmarks except LSTM. The results of preliminary work suggest our model can help key cybersecurity stakeholders (e.g., analysts, researchers, educators) identify what type of business an exploit is targeting.

Related papers

The Digital Cybersecurity Expert: How Far Have We Come? [49.89857422097055]
We develop CSEBenchmark, a fine-grained cybersecurity evaluation framework based on 345 knowledge points expected of cybersecurity experts. We evaluate 12 popular large language models (LLMs) on CSEBenchmark and find that even the best-performing model achieves only 85.42% overall accuracy. By identifying and addressing specific knowledge gaps in each LLM, we achieve up to an 84% improvement in correcting previously incorrect predictions.
arXiv Detail & Related papers (2025-04-16T05:36:28Z)
EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums [1.5192294544599656]
Underground forums serve as hubs for cybercriminal activities, offering a space for anonymity and evasion of online oversight. Identifying the key instigators behind these operations is essential but remains a complex challenge. This paper presents a novel method called EUREKHA, designed to identify these key hackers by modeling each user as a textual sequence.
arXiv Detail & Related papers (2024-11-08T11:09:45Z)
Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies. This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z)
Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions [2.243674903279612]
State-of-the-art machine learning techniques can predict functions with possible security vulnerabilities in JavaScript programs. Best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76. Deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70.
arXiv Detail & Related papers (2024-05-12T08:23:42Z)
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack. When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness [1.4932549821542682]
This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition tasks. In binary classification experiments, GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models.
arXiv Detail & Related papers (2024-01-26T13:15:24Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models [74.58014281829946]
We analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models.
arXiv Detail & Related papers (2023-10-19T11:49:22Z)
Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server. Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples. We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z)
Detection of Malicious Websites Using Machine Learning Techniques [0.0]
K-Nearest Neighbor is the only model that performs consistently high across datasets. Other models such as Random Forest, Decision Trees, Logistic Regression, and Support Vector Machines also consistently outperform a baseline model of predicting every link as malicious.
arXiv Detail & Related papers (2022-09-13T13:48:31Z)
Insider Detection using Deep Autoencoder and Variational Autoencoder Neural Networks [2.5234156040689237]
Insider attacks are one of the most challenging cybersecurity issues for companies, businesses and critical infrastructures. In this paper, we aim to address this issue by using deep learning algorithms Autoencoder and Variational Autoencoder deep. We will especially investigate the usefulness of applying these algorithms to automatically defend against potential internal threats, without human intervention.
arXiv Detail & Related papers (2021-09-06T16:08:51Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.