Related papers: Automated CVE Analysis: Harnessing Machine Learning In Designing Question-Answering Models For Cybersecurity Information Extraction

Automated CVE Analysis: Harnessing Machine Learning In Designing Question-Answering Models For Cybersecurity Information Extraction

URL: http://arxiv.org/abs/2412.16484v1
Date: Sat, 21 Dec 2024 04:50:45 GMT
Title: Automated CVE Analysis: Harnessing Machine Learning In Designing Question-Answering Models For Cybersecurity Information Extraction
Authors: Tanjim Bin Faruk,
Abstract summary: Question Answering (QA) systems play a pivotal role in the mapping of relationships between various data points.<n>In the cybersecurity context, QA systems encounter unique challenges due to the need to interpret and answer questions based on a wide array of domain-specific information.<n>This paper presents a novel dataset and describes a machine learning model trained on this dataset for the QA task.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The vast majority of cybersecurity information is unstructured text, including critical data within databases such as CVE, NVD, CWE, CAPEC, and the MITRE ATT&CK Framework. These databases are invaluable for analyzing attack patterns and understanding attacker behaviors. Creating a knowledge graph by integrating this information could unlock significant insights. However, processing this large amount of data requires advanced deep-learning techniques. A crucial step towards building such a knowledge graph is developing a robust mechanism for automating the extraction of answers to specific questions from the unstructured text. Question Answering (QA) systems play a pivotal role in this process by pinpointing and extracting precise information, facilitating the mapping of relationships between various data points. In the cybersecurity context, QA systems encounter unique challenges due to the need to interpret and answer questions based on a wide array of domain-specific information. To tackle these challenges, it is necessary to develop a cybersecurity-specific dataset and train a machine learning model on it, aimed at enhancing the understanding and retrieval of domain-specific information. This paper presents a novel dataset and describes a machine learning model trained on this dataset for the QA task. It also discusses the model's performance and key findings in a manner that maintains a balance between formality and accessibility.

Related papers

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [68.46693401421923]
WebShaper systematically formalizes IS tasks through set theory.<n>WebShaper achieves state-of-the-art performance among open-sourced IS agents on GAIA and WebWalkerQA benchmarks.
arXiv Detail & Related papers (2025-07-20T17:53:37Z)
MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation [0.0]
Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective machine learning and agentic AI training.<n>To fill this gap, we developed a systematic approach for generating a dataset that combines automated malware execution in controlled virtual environments with dynamic monitoring tools.<n>The resulting dataset comprises clean and infected memory snapshots across multiple malware families and operating systems, capturing detailed behavioural and environmental features.
arXiv Detail & Related papers (2025-07-05T10:45:45Z)
Data Therapist: Eliciting Domain Knowledge from Subject Matter Experts Using Large Language Models [17.006423792670414]
We present the Data Therapist, a web-based tool that helps domain experts externalize implicit knowledge through a mixed-initiative process. The resulting structured knowledge base can inform both human and automated visualization design.
arXiv Detail & Related papers (2025-05-01T11:10:17Z)
MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model [1.33134751838052]
This paper introduces the Multidimensional Data Storytelling Framework (MDSF) based on large language models for automated insight generation and context-aware storytelling. The framework incorporates advanced preprocessing techniques, augmented analysis algorithms, and a unique scoring mechanism to identify and prioritize actionable insights.
arXiv Detail & Related papers (2025-01-02T02:35:38Z)
Adversarial Challenges in Network Intrusion Detection Systems: Research Insights and Future Prospects [0.33554367023486936]
This paper provides a comprehensive review of machine learning-based Network Intrusion Detection Systems (NIDS) We critically examine existing research in NIDS, highlighting key trends, strengths, and limitations. We discuss emerging challenges in the field and offer insights for the development of more robust and resilient NIDS.
arXiv Detail & Related papers (2024-09-27T13:27:29Z)
Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction [43.045596895389345]
We introduce a modern, semi-automated approach for creating datasets, encompassing tasks such as KBQA, Machine Reading (MRC), and Information Retrieval (IR) We provide a comprehensive implementation, insightful findings, detailed statistics, and evaluation of baseline models.
arXiv Detail & Related papers (2024-08-05T09:23:49Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA. We first augment the existing data via deliberate perturbations on either the image or question. We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits. In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z)
Federated Edge Learning : Design Issues and Challenges [1.916348196696894]
Federated Learning (FL) is a distributed machine learning technique, where each device contributes to the learning model by independently computing the gradient based on its local training data. implementing FL at the network edge is challenging due to system and data heterogeneity and resources constraints. This article proposes a general framework for the data-aware scheduling as a guideline for future research directions.
arXiv Detail & Related papers (2020-08-31T19:56:36Z)
Graph signal processing for machine learning: A review and new perspectives [57.285378618394624]
We review a few important contributions made by GSP concepts and tools, such as graph filters and transforms, to the development of novel machine learning algorithms. We discuss exploiting data structure and relational priors, improving data and computational efficiency, and enhancing model interpretability. We provide new perspectives on future development of GSP techniques that may serve as a bridge between applied mathematics and signal processing on one side, and machine learning and network science on the other.
arXiv Detail & Related papers (2020-07-31T13:21:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.