Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization
- URL: http://arxiv.org/abs/2403.16222v2
- Date: Tue, 26 Mar 2024 15:28:27 GMT
- Title: Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization
- Authors: Ryan Barron, Maksim E. Eren, Manish Bhattarai, Selma Wanna, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas, Cynthia Matuszek,
- Abstract summary: Much of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers.
Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner.
One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text.
- Score: 8.158794536515245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers. As this textual data continues to expand, the importance of document organization methods becomes increasingly crucial for extracting actionable insights hidden within large text datasets. Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner, providing explicit, interpretable knowledge that includes domain-specific information from the cybersecurity scientific literature. One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text. In this paper, we address this topic and introduce a method for building a multi-modal KG by extracting structured ontology from scientific papers. We demonstrate this concept in the cybersecurity domain. One modality of the KG represents observable information from the papers, such as the categories in which they were published or the authors. The second modality uncovers latent (hidden) patterns of text extracted through hierarchical and semantic non-negative matrix factorization (NMF), such as named entities, topics or clusters, and keywords. We illustrate this concept by consolidating more than two million scientific papers uploaded to arXiv into the cyber-domain, using hierarchical and semantic NMF, and by building a cyber-domain-specific KG.
Related papers
- Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding.
We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations.
We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z) - Domain Adaptation for Large-Vocabulary Object Detectors [103.16365373806829]
This paper presents KGD, a Knowledge Graph Distillation technique that exploits the implicit knowledge graphs (KG) in CLIP for effectively adapting LVDs to various downstream domains.
Experiments over multiple widely adopted detection benchmarks show that KGD outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-01-13T03:51:18Z) - Object Recognition from Scientific Document based on Compartment Refinement Framework [2.699900017799093]
It has become increasingly important to extract valuable information from vast resources efficiently.
Current data extraction methods for scientific documents typically use rule-based (RB) or machine learning (ML) approaches.
We propose a new document layout analysis framework called CTBR(Compartment & Text Blocks Refinement)
arXiv Detail & Related papers (2023-12-14T15:36:49Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - SKG: A Versatile Information Retrieval and Analysis Framework for
Academic Papers with Semantic Knowledge Graphs [9.668240269886413]
We propose a Semantic Knowledge Graph that integrates semantic concepts from abstracts and other meta-information to represent the corpus.
The SKG can support various semantic queries in academic literature thanks to the high diversity and rich information content stored within.
arXiv Detail & Related papers (2023-06-07T20:16:08Z) - Semantic Similarity Measure of Natural Language Text through Machine
Learning and a Keyword-Aware Cross-Encoder-Ranking Summarizer -- A Case Study
Using UCGIS GIS&T Body of Knowledge [2.4909170697740968]
GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics.
This research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text.
It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications.
arXiv Detail & Related papers (2023-05-17T01:17:57Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Multi-Document Scientific Summarization from a Knowledge Graph-Centric
View [9.579482432715261]
We present KGSum, an MDSS model centred on knowledge graphs during both the encoding and decoding process.
Specifically, in the encoding process, two graph-based modules are proposed to incorporate knowledge graph information into paper encoding.
In the decoding process, we propose a two-stage decoder by first generating knowledge graph information of summary in the form of descriptive sentences, followed by generating the final summary.
arXiv Detail & Related papers (2022-09-09T14:20:59Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Relational Learning Analysis of Social Politics using Knowledge Graph
Embedding [11.978556412301975]
This paper presents a novel credibility domain-based KG Embedding framework.
It involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain.
The framework also embodies a credibility module to ensure data quality and trustworthiness.
arXiv Detail & Related papers (2020-06-02T14:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.