Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain
- URL: http://arxiv.org/abs/2011.01103v1
- Date: Wed, 28 Oct 2020 08:31:40 GMT
- Title: Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain
- Authors: Danilo Dess\`i, Francesco Osborne, Diego Reforgiato Recupero, Davide
Buscaldi, Enrico Motta
- Abstract summary: We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
- Score: 1.9004296236396943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The continuous growth of scientific literature brings innovations and, at the
same time, raises new challenges. One of them is related to the fact that its
analysis has become difficult due to the high volume of published papers for
which manual effort for annotations and management is required. Novel
technological infrastructures are needed to help researchers, research policy
makers, and companies to time-efficiently browse, analyse, and forecast
scientific research. Knowledge graphs i.e., large networks of entities and
relationships, have proved to be effective solution in this space. Scientific
knowledge graphs focus on the scholarly domain and typically contain metadata
describing research publications such as authors, venues, organizations,
research topics, and citations. However, the current generation of knowledge
graphs lacks of an explicit representation of the knowledge presented in the
research papers. As such, in this paper, we present a new architecture that
takes advantage of Natural Language Processing and Machine Learning methods for
extracting entities and relationships from research publications and integrates
them in a large-scale knowledge graph. Within this research work, we i) tackle
the challenge of knowledge extraction by employing several state-of-the-art
Natural Language Processing and Text Mining tools, ii) describe an approach for
integrating entities and relationships generated by these tools, iii) show the
advantage of such an hybrid system over alternative approaches, and vi) as a
chosen use case, we generated a scientific knowledge graph including 109,105
triples, extracted from 26,827 abstracts of papers within the Semantic Web
domain. As our approach is general and can be applied to any domain, we expect
that it can facilitate the management, analysis, dissemination, and processing
of scientific knowledge.
Related papers
- Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research [2.1728621449144763]
Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science.
Traditional methods, relying on keyword searches, often fail to uncover valuable insights not explicitly stated in article titles or keywords.
We leverage Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis.
arXiv Detail & Related papers (2024-10-08T05:13:27Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model [16.030268397865264]
This article introduces the Materials Knowledge Graph (MKG), which utilizes advanced natural language processing techniques.
MKG categorizes information into comprehensive labels such as Name, Formula, and Application, structured around a meticulously designed ontology.
By implementing network-based algorithms, MKG not only facilitates efficient link prediction but also significantly reduces reliance on traditional experimental methods.
arXiv Detail & Related papers (2024-04-03T21:46:14Z) - AceMap: Knowledge Discovery through Academic Graph [90.12694363549483]
AceMap is an academic system designed for knowledge discovery through academic graph.
We present advanced database construction techniques to build the comprehensive AceMap database.
AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas.
arXiv Detail & Related papers (2024-03-05T01:17:56Z) - An approach based on Open Research Knowledge Graph for Knowledge
Acquisition from scientific papers [4.8951183832371]
Open Research Knowledge Graph (ORKG) is a computer-assisted tool to organize key-insights extracted from research papers.
It is currently used to document "food information engineering", "Tabular data to Knowledge Graph Matching" and "Question Answering" research problems and "Neuro-symbolic AI" domain.
arXiv Detail & Related papers (2023-08-23T20:05:42Z) - Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction [104.29108668347727]
This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models.
The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies.
We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.
arXiv Detail & Related papers (2023-07-03T16:01:45Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Citation Trajectory Prediction via Publication Influence Representation
Using Temporal Knowledge Graph [52.07771598974385]
Existing approaches mainly rely on mining temporal and graph data from academic articles.
Our framework is composed of three modules: difference-preserved graph embedding, fine-grained influence representation, and learning-based trajectory calculation.
Experiments are conducted on both the APS academic dataset and our contributed AIPatent dataset.
arXiv Detail & Related papers (2022-10-02T07:43:26Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.