Related papers: Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization

Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization

URL: http://arxiv.org/abs/2403.16222v2
Date: Tue, 26 Mar 2024 15:28:27 GMT
Title: Cyber-Security Knowledge Graph Generation by Hierarchical Nonnegative Matrix Factorization
Authors: Ryan Barron, Maksim E. Eren, Manish Bhattarai, Selma Wanna, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas, Cynthia Matuszek,
Abstract summary: Much of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers. Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner. One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text.
Score: 8.158794536515245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Much of human knowledge in cybersecurity is encapsulated within the ever-growing volume of scientific papers. As this textual data continues to expand, the importance of document organization methods becomes increasingly crucial for extracting actionable insights hidden within large text datasets. Knowledge Graphs (KGs) serve as a means to store factual information in a structured manner, providing explicit, interpretable knowledge that includes domain-specific information from the cybersecurity scientific literature. One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text. In this paper, we address this topic and introduce a method for building a multi-modal KG by extracting structured ontology from scientific papers. We demonstrate this concept in the cybersecurity domain. One modality of the KG represents observable information from the papers, such as the categories in which they were published or the authors. The second modality uncovers latent (hidden) patterns of text extracted through hierarchical and semantic non-negative matrix factorization (NMF), such as named entities, topics or clusters, and keywords. We illustrate this concept by consolidating more than two million scientific papers uploaded to arXiv into the cyber-domain, using hierarchical and semantic NMF, and by building a cyber-domain-specific KG.

Related papers

DISRetrieval: Harnessing Discourse Structure for Long Document Retrieval [51.89673002051528]
DISRetrieval is a novel hierarchical retrieval framework that leverages linguistic discourse structure to enhance long document understanding.<n>Our studies confirm that discourse structure significantly enhances retrieval effectiveness across different document lengths and query types.
arXiv Detail & Related papers (2025-05-26T14:45:12Z)
Enhancing Abstractive Summarization of Scientific Papers Using Structure Information [6.414732533433283]
We propose a two-stage abstractive summarization framework that leverages automatic recognition of structural functions within scientific papers.<n>In the first stage, we standardize chapter titles from numerous scientific papers and construct a large-scale dataset for structural function recognition.<n>In the second stage, we employ Longformer to capture rich contextual relationships across sections and generating context-aware summaries.
arXiv Detail & Related papers (2025-05-20T10:34:45Z)
Data-driven Coreference-based Ontology Building [48.995395445597225]
Coreference resolution is traditionally used as a component in individual document understanding. We take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations. We release the coreference chains resulting under a creative-commons license, along with the code.
arXiv Detail & Related papers (2024-10-22T14:30:40Z)
Domain Adaptation for Large-Vocabulary Object Detectors [103.16365373806829]
This paper presents KGD, a Knowledge Graph Distillation technique that exploits the implicit knowledge graphs (KG) in CLIP for effectively adapting LVDs to various downstream domains. Experiments over multiple widely adopted detection benchmarks show that KGD outperforms the state-of-the-art consistently by large margins.
arXiv Detail & Related papers (2024-01-13T03:51:18Z)
Object Recognition from Scientific Document based on Compartment Refinement Framework [2.699900017799093]
It has become increasingly important to extract valuable information from vast resources efficiently. Current data extraction methods for scientific documents typically use rule-based (RB) or machine learning (ML) approaches. We propose a new document layout analysis framework called CTBR(Compartment & Text Blocks Refinement)
arXiv Detail & Related papers (2023-12-14T15:36:49Z)
Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques. The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z)
SKG: A Versatile Information Retrieval and Analysis Framework for Academic Papers with Semantic Knowledge Graphs [9.668240269886413]
We propose a Semantic Knowledge Graph that integrates semantic concepts from abstracts and other meta-information to represent the corpus. The SKG can support various semantic queries in academic literature thanks to the high diversity and rich information content stored within.
arXiv Detail & Related papers (2023-06-07T20:16:08Z)
Semantic Similarity Measure of Natural Language Text through Machine Learning and a Keyword-Aware Cross-Encoder-Ranking Summarizer -- A Case Study Using UCGIS GIS&T Body of Knowledge [2.4909170697740968]
GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics. This research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications.
arXiv Detail & Related papers (2023-05-17T01:17:57Z)
Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z)
Multi-Document Scientific Summarization from a Knowledge Graph-Centric View [9.579482432715261]
We present KGSum, an MDSS model centred on knowledge graphs during both the encoding and decoding process. Specifically, in the encoding process, two graph-based modules are proposed to incorporate knowledge graph information into paper encoding. In the decoding process, we propose a two-stage decoder by first generating knowledge graph information of summary in the form of descriptive sentences, followed by generating the final summary.
arXiv Detail & Related papers (2022-09-09T14:20:59Z)
TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo) We first present a flexible heterogeneous semantic network that incorporates high-quality entities. We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z)
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information. KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based. Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
Relational Learning Analysis of Social Politics using Knowledge Graph Embedding [11.978556412301975]
This paper presents a novel credibility domain-based KG Embedding framework. It involves capturing a fusion of data obtained from heterogeneous resources into a formal KG representation depicted by a domain. The framework also embodies a credibility module to ensure data quality and trustworthiness.
arXiv Detail & Related papers (2020-06-02T14:10:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.