Related papers: Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness

Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness

URL: http://arxiv.org/abs/2401.15127v3
Date: Fri, 19 Apr 2024 09:40:04 GMT
Title: Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness
Authors: Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira,
Abstract summary: This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition tasks. In binary classification experiments, GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models.
Score: 1.4932549821542682
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Knowledge sharing about emerging threats is crucial in the rapidly advancing field of cybersecurity and forms the foundation of Cyber Threat Intelligence (CTI). In this context, Large Language Models are becoming increasingly significant in the field of cybersecurity, presenting a wide range of opportunities. This study surveys the performance of ChatGPT, GPT4all, Dolly, Stanford Alpaca, Alpaca-LoRA, Falcon, and Vicuna chatbots in binary classification and Named Entity Recognition (NER) tasks performed using Open Source INTelligence (OSINT). We utilize well-established data collected in previous research from Twitter to assess the competitiveness of these chatbots when compared to specialized models trained for those tasks. In binary classification experiments, Chatbot GPT-4 as a commercial model achieved an acceptable F1 score of 0.94, and the open-source GPT4all model achieved an F1 score of 0.90. However, concerning cybersecurity entity recognition, all evaluated chatbots have limitations and are less effective. This study demonstrates the capability of chatbots for OSINT binary classification and shows that they require further improvement in NER to effectively replace specially trained models. Our results shed light on the limitations of the LLM chatbots when compared to specialized models, and can help researchers improve chatbots technology with the objective to reduce the required effort to integrate machine learning in OSINT-based CTI tools.

Related papers

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards [93.16294577018482]
Arena, the most popular benchmark of this type, ranks models by asking users to select the better response between two randomly selected models. We show that an attacker can alter the leaderboard (to promote their favorite model or demote competitors) at the cost of roughly a thousand votes. Our attack consists of two steps: first, we show how an attacker can determine which model was used to generate a given reply with more than $95%$ accuracy; and then, the attacker can use this information to consistently vote against a target model.
arXiv Detail & Related papers (2025-01-13T17:12:38Z)
Seq2Seq Model-Based Chatbot with LSTM and Attention Mechanism for Enhanced User Interaction [1.937324318931008]
This work proposes a Sequence-to-Sequence (Seq2Seq) model with an encoder-decoder architecture that incorporates attention mechanisms and Long Short-Term Memory (LSTM) cells. The proposed Seq2Seq model-based robot is trained, validated, and tested on a dataset specifically for the tourism sector in Draa-Tafilalet, Morocco.
arXiv Detail & Related papers (2024-12-27T23:50:54Z)
IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery [10.937956959186472]
IntellBot is an advanced cyber security built on top of cutting-edge technologies like Large Language Models and Langchain. It gathers information from diverse data sources to create a comprehensive knowledge base covering known vulnerabilities, recent cyber attacks, and emerging threats. It delivers tailored responses, serving as a primary hub for cyber security insights.
arXiv Detail & Related papers (2024-11-08T09:40:53Z)
A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets [4.311626046942916]
We present an automated transformer-based approach to augment software engineering datasets. We evaluate the impact of using the augmentation approach on the Rasa NLU's performance using three software engineering datasets.
arXiv Detail & Related papers (2024-07-16T17:48:44Z)
Analysis of the User Perception of Chatbots in Education Using A Partial Least Squares Structural Equation Modeling Approach [0.0]
Key behavior-related aspects, such as Optimism, Innovativeness, Discomfort, Insecurity, Transparency, Ethics, Interaction, Engagement, and Accuracy, were studied. Results showed that Optimism and Innovativeness are positively associated with Perceived Ease of Use (PEOU) and Perceived Usefulness (PU)
arXiv Detail & Related papers (2023-11-07T00:44:56Z)
Generative Input: Towards Next-Generation Input Methods Paradigm [49.98958865125018]
We propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task.
arXiv Detail & Related papers (2023-11-02T12:01:29Z)
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education [0.0]
This paper explores the potential integration of large language models (LLMs) and chatbots into graduate engineering education. We develop a question bank from the course material and assess the bot's ability to provide accurate, insightful responses. We demonstrate how powerful plugins like Wolfram Alpha for mathematical problem-solving and code interpretation can significantly extend the bot's capabilities.
arXiv Detail & Related papers (2023-09-09T13:37:22Z)
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents [56.104476412839944]
Large Language Models (LLMs) have demonstrated remarkable zero-shot generalization across various language-related tasks. This paper investigates generative LLMs for relevance ranking in Information Retrieval (IR) To address concerns about data contamination of LLMs, we collect a new test set called NovelEval. To improve efficiency in real-world applications, we delve into the potential for distilling the ranking capabilities of ChatGPT into small specialized models.
arXiv Detail & Related papers (2023-04-19T10:16:03Z)
A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
Predicting Organizational Cybersecurity Risk: A Deep Learning Approach [0.0]
Hackers use exploits found on hacker forums to carry out complex cyberattacks. We propose a hacker forum entity recognition framework (HackER) to identify exploits and the entities that the exploits target. HackER then uses a bidirectional long short-term memory model (BiLSTM) to create a predictive model for what companies will be targeted by exploits.
arXiv Detail & Related papers (2020-12-26T01:15:34Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. We show that different types of bots are characterized by different behavioral features. We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.