A Deep Learning Approach for Ontology Enrichment from Unstructured Text
- URL: http://arxiv.org/abs/2112.08554v1
- Date: Thu, 16 Dec 2021 01:32:21 GMT
- Title: A Deep Learning Approach for Ontology Enrichment from Unstructured Text
- Authors: Lalit Mohan Sanagavarapu, Vivek Iyer and Raghu Reddy
- Abstract summary: Existing information vulnerabilities on attacks, controls, and advisories available on the web provide an opportunity to represent and perform security analytics.
Ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases, and sentences.
Bidirectional LSTMs trained on a large DB dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence is deployed to enrich ISO-based information security.
- Score: 2.932750332087746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information Security in the cyber world is a major cause for concern, with a
significant increase in the number of attack surfaces. Existing information on
vulnerabilities, attacks, controls, and advisories available on the web
provides an opportunity to represent knowledge and perform security analytics
to mitigate some of the concerns. Representing security knowledge in the form
of ontology facilitates anomaly detection, threat intelligence, reasoning and
relevance attribution of attacks, and many more. This necessitates dynamic and
automated enrichment of information security ontologies. However, existing
ontology enrichment algorithms based on natural language processing and ML
models have issues with contextual extraction of concepts in words, phrases,
and sentences. This motivates the need for sequential Deep Learning
architectures that traverse through dependency paths in text and extract
embedded vulnerabilities, threats, controls, products, and other
security-related concepts and instances from learned path representations. In
the proposed approach, Bidirectional LSTMs trained on a large DBpedia dataset
and Wikipedia corpus of 2.8 GB along with Universal Sentence Encoder is
deployed to enrich ISO 27001-based information security ontology. The model is
trained and tested on a high-performance computing (HPC) environment to handle
Wiki text dimensionality. The approach yielded a test accuracy of over 80% when
tested with knocked-out concepts from ontology and web page instances to
validate the robustness.
Related papers
- CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity [49.657358248788945]
Textual descriptions in cyber threat intelligence (CTI) reports are rich sources of knowledge about cyber threats.
Current CTI extraction methods lack flexibility and generalizability, often resulting in inaccurate and incomplete knowledge extraction.
We propose CTINexus, a novel framework leveraging optimized in-context learning (ICL) of large language models.
arXiv Detail & Related papers (2024-10-28T14:18:32Z) - Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection [4.629503670145618]
Software vulnerabilities are a challenge in cybersecurity.
DeepEXE is an agent-based implicit neural network that mimics the execution path of a program.
We show that DeepEXE is an accurate and efficient method and outperforms the state-of-the-art vulnerability detection methods.
arXiv Detail & Related papers (2024-04-03T22:07:50Z) - HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for
Root Cause Analysis with GPT-assisted Mitigation Suggestion [3.847218857469107]
We presentHW-V2W-Map Framework, which is a Machine Learning (ML) framework focusing on hardware vulnerabilities and Internet of Things (IoT) security.
The architecture that we have proposed incorporates an Ontology-driven Storytelling framework, which automates the process of updating the Ontology.
Our proposed framework utilized Generative Pre-trained Transformer (GPT) Large Language Models (LLMs) to provide mitigation suggestions.
arXiv Detail & Related papers (2023-12-21T02:14:41Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - DCDetector: An IoT terminal vulnerability mining system based on
distributed deep ensemble learning under source code representation [2.561778620560749]
The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++.
This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model.
Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.
arXiv Detail & Related papers (2022-11-29T14:19:14Z) - Towards Automated Classification of Attackers' TTPs by combining NLP
with ML Techniques [77.34726150561087]
We evaluate and compare different Natural Language Processing (NLP) and machine learning techniques used for security information extraction in research.
Based on our investigations we propose a data processing pipeline that automatically classifies unstructured text according to attackers' tactics and techniques.
arXiv Detail & Related papers (2022-07-18T09:59:21Z) - Multi-features based Semantic Augmentation Networks for Named Entity
Recognition in Threat Intelligence [7.321994923276344]
We propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens.
In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method.
We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-07-01T06:55:12Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - OntoEnricher: A Deep Learning Approach for Ontology Enrichment from
Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns.
This necessitates dynamic and automated enrichment of information security.
Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.